regexp for finding everything between <a> and </a> tags
The standard disclaimer applies: Parsing HTML with regular expressions is not ideal. Success depends on the well-formedness of the input on a character-by-character level. If you cannot guarantee this, the regex will fail to do the Right Thing at some point.
Having said that:
<a\b[^>]*>(.*?)</a> // match group one will contain the link text
I'm a big fan of regexes, but this is not the right place to use them.
Use a real HTML parser.
- Your code will be clearer
- It will be more likely to work
I Googled for a PHP HTML parser, and found this one.
If you know you're working with XHTML, then you could use PHP's standard XML parser.
<a\s*(.*)\>(.*)</a>
<a href="http://www.stackoverflow.com">Go to stackoverflow.com</a>
$1 = href="www.stackoverflow.com"
$2 = Go to stackoverflow.com
I answered a similar question to strip everything except a tags here