How to match content between HTML specific tags with attribute using grep?
You can do that by specifying a regex:
grep -E "^<div class=\"Message\">.*</div>$" input_files
Not that this will only print the enclosures found on the same line. If your tag spans multiple lines, you can try:
tr '\n' ' ' < input_file | grep -E "^<div class=\"Message\">.*</div>$"
You can't do it reliably with just grep. You need to parse the HTML with an HTML parser.
What if the HTML code has something like:
<!--
<div class="Message">blah blah</div>
-->
You'll get a false hit on that commented-out code. Here are some other examples where a regex-only option will fail you.
Consider using xmlgrep from the XML::Grep
Perl module, as discussed here: Extract Title of a html file using grep
Here's one way using GNU grep
:
grep -oP '(?<=<div class="Message"> ).*?(?= </div>)' file
If your tags span multiple lines, try:
< file tr -d '\n' | grep -oP '(?<=<div class="Message"> ).*?(?= </div>)'