Unicode in PDF

In the PDF reference in chapter 3, this is what they say about Unicode:

Text strings are encoded in either PDFDocEncoding or Unicode character encoding. PDFDocEncoding is a superset of the ISO Latin 1 encoding and is documented in Appendix D. Unicode is described in the Unicode Standard by the Unicode Consortium (see the Bibliography). For text strings encoded in Unicode, the first two bytes must be 254 followed by 255. These two bytes represent the Unicode byte order marker, U+FEFF, indicating that the string is encoded in the UTF-16BE (big-endian) encoding scheme specified in the Unicode standard. (This mechanism precludes beginning a string using PDFDocEncoding with the two characters thorn ydieresis, which is unlikely to be a meaningful beginning of a word or phrase).


Algoman's answer is wrong in many things. You can make a PDF document with Unicode in it and it's not rocket science, though it needs some work. Yes he is right, to use more than 255 characters in one font you have to create a composite font (CIDFont) pdf object. Then you just mention the actual TrueType font you want to use as a DescendatFont entry of CIDFont. The trick is that after that you have to use glyph indices of a font instead of character codes. To get this indices map you have to parse cmap section of a font - get contents of the font with GetFontData function and take hands on TTF specification. And that's it! I've just did it and now I have a Unicode PDF!

Sample Code for parsing cmap section is here: https://web.archive.org/web/20150329005245/http://support.microsoft.com/en-us/kb/241020

And yes, don't forget /ToUnicode entry as @user2373071 pointed out or user will not be able to search your PDF or copy text from it.


The simple answer is that there's no simple answer. If you take a look at the PDF specification, you'll see an entire chapter — and a long one at that — devoted to the mechanisms of text display. I implemented all of the PDF support for my company, and handling text was by far the most complex part of exercise. The solution you discovered — use a 3rd party library to do the work for you — is really the best choice, unless you have very specific, special-purpose requirements for your PDF files.