How do I convert between ISO-8859-1 and UTF-8 in Java?
Which worked for me: ("üzüm bağları" is the correct written in Turkish)
Convert ISO-8859-1 to UTF-8:
String encodedWithISO88591 = "üzüm baÄları";
String decodedToUTF8 = new String(encodedWithISO88591.getBytes("ISO-8859-1"), "UTF-8");
//Result, decodedToUTF8 --> "üzüm bağları"
Convert UTF-8 to ISO-8859-1
String encodedWithUTF8 = "üzüm bağları";
String decodedToISO88591 = new String(encodedWithUTF8.getBytes("UTF-8"), "ISO-8859-1");
//Result, decodedToISO88591 --> "üzüm baÄları"
In general, you can't do this. UTF-8 is capable of encoding any Unicode code point. ISO-8859-1 can handle only a tiny fraction of them. So, transcoding from ISO-8859-1 to UTF-8 is no problem. Going backwards from UTF-8 to ISO-8859-1 will cause "replacement characters" (�) to appear in your text when unsupported characters are found.
To transcode text:
byte[] latin1 = ...
byte[] utf8 = new String(latin1, "ISO-8859-1").getBytes("UTF-8");
or
byte[] utf8 = ...
byte[] latin1 = new String(utf8, "UTF-8").getBytes("ISO-8859-1");
You can exercise more control by using the lower-level Charset
APIs. For example, you can raise an exception when an un-encodable character is found, or use a different character for replacement text.
If you have a String
, you can do that:
String s = "test";
try {
s.getBytes("UTF-8");
} catch(UnsupportedEncodingException uee) {
uee.printStackTrace();
}
If you have a 'broken' String
, you did something wrong, converting a String
to a String
in another encoding is defenetely not the way to go! You can convert a String
to a byte[]
and vice-versa (given an encoding). In Java String
s are AFAIK encoded with UTF-16
but that's an implementation detail.
Say you have a InputStream
, you can read in a byte[]
and then convert that to a String
using
byte[] bs = ...;
String s;
try {
s = new String(bs, encoding);
} catch(UnsupportedEncodingException uee) {
uee.printStackTrace();
}
or even better (thanks to erickson) use InputStreamReader
like that:
InputStreamReader isr;
try {
isr = new InputStreamReader(inputStream, encoding);
} catch(UnsupportedEncodingException uee) {
uee.printStackTrace();
}
Here is an easy way with String output (I created a method to do this):
public static String (String input){
String output = "";
try {
/* From ISO-8859-1 to UTF-8 */
output = new String(input.getBytes("ISO-8859-1"), "UTF-8");
/* From UTF-8 to ISO-8859-1 */
output = new String(input.getBytes("UTF-8"), "ISO-8859-1");
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
return output;
}
// Example
input = "Música";
output = "Música";