Using natural language processing to extract an address from a tweet

How to parse freeform street/postal address out of text, and into components answers the question "Is there a way to isolate an address from the text around it and break it into pieces?" -- which is essentially the same question as yours (except that you don't care about breaking it into pieces -- just isolating it from the rest of the text).

SmartyStreets also has a nice demo at https://smartystreets.com/demo?mode=extract , but not a free solution unfortunately.

Another quick thought -- Since twitter posts are limited to 140 characters, and tend to contain few words (your two examples have 9 and 12 words, respectively), you could conceivably just brute-force it. For example, to get the location in "@twitterbot, what's near Yonge & Dundas, Toronto? I'm hungry!", you could send all of the following to the google geocoder --

what's near Yonge & Dundas, Toronto? I'm hungry!

what's near Yonge & Dundas, Toronto? I'm

what's near Yonge & Dundas, Toronto?

what's near Yonge & Dundas,

etc. for all possible substrings composed of complete words.


Here you go: http://geocoder.ca/?locate=Hey+%40twitterbot%2C+I%27m+looking+for+restaurants+around+123+Main+Street%2C+New+York&geoit=xml&parse=1

<geodata>
<latt>40.5119365</latt>
<longt>-74.2493562</longt>
<AreaCode>347,718</AreaCode>
<TimeZone>America/New_York</TimeZone>
<standard>
     <stnumber>123</stnumber>
     <staddress>Main ST</staddress>
     <city>STATEN ISLAND</city>
     <prov>NY</prov>
     <postal>11385</postal>
     <confidence>0.9</confidence>
  </standard>
</geodata>

or http://geocoder.ca/?locate=Hey+%40twitterbot%2C+I%27m+looking+for+restaurants+around+123+Main+Street%2C+New+York