Encoding conversion in java
You don't need a library beyond the standard one - just use Charset. (You can just use the String constructors and getBytes methods, but personally I don't like just working with the names of character encodings. Too much room for typos.)
EDIT: As pointed out in comments, you can still use Charset instances but have the ease of use of the String methods: new String(bytes, charset) and String.getBytes(charset).
See "URL Encoding (or: 'What are those "%20
" codes in URLs?')".
CharsetDecoder
should be what you are looking for, no ?
Many network protocols and files store their characters with a byte-oriented character set such as ISO-8859-1
(ISO-Latin-1
).
However, Java's native character encoding is Unicode UTF16BE (Sixteen-bit UCS Transformation Format, big-endian byte order).
See Charset
. That doesn't mean UTF16
is the default charset (i.e.: the default "mapping between sequences of sixteen-bit Unicode code units and sequences of bytes"):
Every instance of the Java virtual machine has a default charset, which may or may not be one of the standard charsets.
[US-ASCII
,ISO-8859-1
a.k.a.ISO-LATIN-1
,UTF-8
,UTF-16BE
,UTF-16LE
,UTF-16
]
The default charset is determined during virtual-machine startup and typically depends upon the locale and charset being used by the underlying operating system.
This example demonstrates how to convert ISO-8859-1
encoded bytes in a ByteBuffer
to a string in a CharBuffer
and visa versa.
// Create the encoder and decoder for ISO-8859-1
Charset charset = Charset.forName("ISO-8859-1");
CharsetDecoder decoder = charset.newDecoder();
CharsetEncoder encoder = charset.newEncoder();
try {
// Convert a string to ISO-LATIN-1 bytes in a ByteBuffer
// The new ByteBuffer is ready to be read.
ByteBuffer bbuf = encoder.encode(CharBuffer.wrap("a string"));
// Convert ISO-LATIN-1 bytes in a ByteBuffer to a character ByteBuffer and then to a string.
// The new ByteBuffer is ready to be read.
CharBuffer cbuf = decoder.decode(bbuf);
String s = cbuf.toString();
} catch (CharacterCodingException e) {
}
I would just like to add that if the String is originally encoded using the wrong encoding it might be impossible to change it to another encoding without errors. The question does not state that the conversion here is made from wrong encoding to correct encoding but I personally stumbled to this question just because of this situation so just a heads up for others as well.
This answer in other question gives an explanation why the conversion does not always yield correct results https://stackoverflow.com/a/2623793/4702806