How exactly does the one-byte "char" type work in PostgreSQL?
1. chr(10)
... produces the LINEFEED character (a.k.a. escape sequence \n
) and psql displays the character with a newline (indicated by +
). Everything correct there.
2. & 3. ascii()
produces 128 or 192?
It starts with a mistake I made. I carelessly assumed "char"
would cover the range of an unsigned 1-byte integer (0 to 255) in the referenced answer (now fixed), but it's actually the range of a signed 1-byte integer (-128 to 127) internally.
ascii()
takes a text
parameter, the implicit cast from "char"
to text
produces a multibyte-encoded character in unicode, and the function returns (per documentation on ascii()
):
ASCII code of the first character of the argument. For UTF8 returns the Unicode code point of the character. For other multibyte encodings, the argument must be an ASCII character.
So we get a lot of truncated values. 128 and 192 are byte values for the leading byte of multibyte characters.
4. The null byte
The inability to store null bytes only affects regular character types (text
, char
, varchar
), not "char"
. It applies to my buggy example, because I cast to text
as stepping stone. While casting between "char"
and integer
directly, the limitation does not apply. The manual on chr()
:
The NULL (0) character is not allowed because text data types cannot store such bytes.
Not so for "char", where 0
is mapped to the empty string ''
:
SELECT ''::"char"::int -- 0
, 0::"char" = ''; -- t
Remember: "char"
is still an "internal" type intended for simple and cheap enumeration. Not officially designed for what we are doing here, and not portable to other RDBMS. There are no guarantees by the Postgres project for this.