How to convert html entities to readable text?
With Free recode
(formerly known as GNU recode
):
recode html < file
If you don't have recode
or HTML::Entities
and only need to decode &#x<hex>;
entities, you could do it by hand with:
perl -Mopen=locale -pe 's/&#x([\da-f]+);/chr hex $1/gie'
From How can I decode HTML entities? on StackOverflow, you may be able to implement a simple perl solution such as
perl -Mopen=locale -MHTML::Entities -pe '$_ = decode_entities($_)' email.txt
e.g. using your example text
$ perl -Mopen=locale -MHTML::Entities -pe '$_ = decode_entities($_)' email.txt
chciałabym zapytać, czy rozważa Pan takze udział w nowych projektach w Warszawie ? Obecnie poszukujemy specjalisty javascript/architekta z bardzo dobrą znajomością Angular.js do projektu, który dotyczy systemu, służącego do monitorowania i zarządzania flotą pojazdów. Zespół, do którego poszukujemy
With -Mopen=locale
, I/O is done in the locale's character set. That includes input from email.txt
. It looks like email.txt
contains only ASCII characters (the whole point of encoding those characters using the &#x<hex>;
notation I suppose), but if not you may need to adapt the above to also decode that file using the right charset (if it's not the same as the locale's one) instead of using open=locale
.
A python 3.2+ version, can be used in a pipe:
python3 -c 'import html, sys; [print(html.unescape(l), end="") for l in sys.stdin]' < file