Haskell source encoding

Unicode is character set. UTF-8, UTF-16 etc are the concrete physical encodings of Unicode codepoints. Try to read here. The difference explained pretty well there.

Cited report's part just states that Haskell sources use Unicode character set. It doesn't state which encoding should be used at all. In other words, it says which characters could appear in the sources, but doesn't say how they could be written in term of plain bytes.


There was a proposal to standardize on UTF-8 as the standard encoding of Haskell source files, but I'm not sure if it was accepted or not.

In practice, GHC assumes all input files are UTF-8, but it ignores malformed byte sequences in comments.