S3 -> Redshift cannot handle UTF8
tl;dr
the byte length for your varchar
column just needs to be larger.
Detail
Multi-byte characters (UTF-8) are supported in the varchar
data type, however the length that is provided is in bytes, NOT characters.
AWS documentation for Multibyte Character Load Errors states the following:
VARCHAR
columns accept multibyte UTF-8 characters, to a maximum of four bytes.
Therefore if you want the character Ä
to be allowed, then you need to allow 2 bytes for this character, instead of 1 byte.
AWS documentation for VARCHAR or CHARACTER VARYING states the following:
... so a
VARCHAR(120)
column consists of a maximum of 120 single-byte characters, 60 two-byte characters, 40 three-byte characters, or 30 four-byte characters.
For a list of UTF-8 characters and their byte lengths, this is a good reference: Complete Character List for UTF-8
Detailed information for the Unicode Character 'LATIN CAPITAL LETTER A WITH DIAERESIS' (U+00C4) can be found here.
Please check below link
http://docs.aws.amazon.com/redshift/latest/dg/multi-byte-character-load-errors.html
You should use ACCEPTINVCHARS in you copy command. Details here
http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html#acceptinvchars