How to match content between HTML specific tags with attribute using grep?

You can do that by specifying a regex:

grep -E "^<div class=\"Message\">.*</div>$" input_files

Not that this will only print the enclosures found on the same line. If your tag spans multiple lines, you can try:

tr '\n' ' ' < input_file | grep -E "^<div class=\"Message\">.*</div>$"

You can't do it reliably with just grep. You need to parse the HTML with an HTML parser.

What if the HTML code has something like:

<!--
<div class="Message">blah blah</div>
-->

You'll get a false hit on that commented-out code. Here are some other examples where a regex-only option will fail you.

Consider using xmlgrep from the XML::Grep Perl module, as discussed here: Extract Title of a html file using grep

Here's one way using GNU grep:

grep -oP '(?<=<div class="Message"> ).*?(?= </div>)' file

If your tags span multiple lines, try:

< file tr -d '\n' | grep -oP '(?<=<div class="Message"> ).*?(?= </div>)'

Related