How can I easily convert HTML special entities from a standard input stream in Linux?
Solution 1:
Perl is (as always) your friend. I think this will do it:
perl -n -mHTML::Entities -e ' ; print HTML::Entities::decode_entities($_) ;'
E.g.:
echo '"test" & test $test ! test @ # $ % ^ & *' |perl -n -mHTML::Entities -e ' ; print HTML::Entities::decode_entities($_) ;'
With output:
someguy@somehost ~]$ echo '"test" & test $test ! test @ # $ % ^ & *' |perl -n -mHTML::Entities -e ' ; print HTML::Entities::decode_entities($_) ;'
"test" & test $test ! test @ # $ % ^ & *
Solution 2:
PHP is well suited to this. This example requires PHP 5:
cat file.html | php -R 'echo html_entity_decode($argn);'
Solution 3:
recode seems available on default packages repositories of main GNU/Linux distributions. E.g. to decode HTML entities into UTF-8 :
…|recode html..utf8
Solution 4:
With Python 3:
python3 -c 'import html,sys; print(html.unescape(sys.stdin.read()), end="")' < file.html