What is the UTF-8 representation of "end of line" in text file

There are a bunch:

  • LF: Line Feed, U+000A (UTF-8 in hex: 0A)
  • VT: Vertical Tab, U+000B (UTF-8 in hex: 0B)
  • FF: Form Feed, U+000C (UTF-8 in hex: 0C)
  • CR: Carriage Return, U+000D (UTF-8 in hex: 0D)
  • CR+LF: CR (U+000D) followed by LF (U+000A) (UTF-8 in hex: 0D0A)
  • NEL: Next Line, U+0085 (UTF-8 in hex: C285)
  • LS: Line Separator, U+2028 (UTF-8 in hex: E280A8)
  • PS: Paragraph Separator, U+2029 (UTF-8 in hex: E280A9)

...and probably many more.

The most commonly used ones are LF (*nix), CR+LF (Windows and DOS), and CR (old pre-OSX Mac systems, mostly).


UTF-8 is compatible with ASCII, so the ASCII codes 10 (0x0A) for linefeed and 13 (0x0D) for carriage return are also used in UTF-8.


From Unicode Character 'LINE FEED (LF)'

In UTF-8 (hex) its --> 0x0A (0a)
UTF-8 (binary) --> 00001010

enter image description here

Tags:

Java

Utf 8