What are the most common non-BMP Unicode characters in actual use?

Emoji are now the most common non-BMP characters by far. 😂, otherwise known as U+1F602 FACE WITH TEARS OF JOY, is the most common one on Twitter's public stream. It occurs more frequently than the tilde!


Excellent question!

The answer is the mathematical letters. This past December I did a scan of the entire PubMed Open Access corpus, and came up with these figures for astral characters in it.

The first number in the figures below is how many copies of each given code point I found in the entire corpus. First, though, to give you a notion on the relative frequencies, here are the top ten trans-ASCII code points in that corpus:

 2663710 U+002013 ‹–›  GC=Pd    EN DASH
 1065594 U+0000A0 ‹ ›  GC=Zs    NO-BREAK SPACE
 1009762 U+0000B1 ‹±›  GC=Sm    PLUS-MINUS SIGN
  784139 U+002212 ‹−›  GC=Sm    MINUS SIGN
  602377 U+002003 ‹ ›  GC=Zs    EM SPACE
  528576 U+0003BC ‹μ›  GC=Ll    GREEK SMALL LETTER MU
  519669 U+0003B2 ‹β›  GC=Ll    GREEK SMALL LETTER BETA
  512312 U+0003B1 ‹α›  GC=Ll    GREEK SMALL LETTER ALPHA
  491842 U+00200A ‹ ›  GC=Zs    HAIR SPACE
  462505 U+0000B0 ‹°›  GC=So    DEGREE SIGN

And here now are the trans-BMP code points, in order of decending frequency:

     544 U+01D49E ‹ð’ž›  GC=Lu    MATHEMATICAL SCRIPT CAPITAL C
     450 U+01D4AF ‹ð’¯›  GC=Lu    MATHEMATICAL SCRIPT CAPITAL T
     385 U+01D4AE ‹ð’®›  GC=Lu    MATHEMATICAL SCRIPT CAPITAL S
     292 U+01D49F ‹ð’Ÿ›  GC=Lu    MATHEMATICAL SCRIPT CAPITAL D
     285 U+01D4B3 ‹ð’³›  GC=Lu    MATHEMATICAL SCRIPT CAPITAL X
     262 U+01D4A9 ‹ð’©›  GC=Lu    MATHEMATICAL SCRIPT CAPITAL N
     258 U+01D4AB ‹ð’«›  GC=Lu    MATHEMATICAL SCRIPT CAPITAL P
     254 U+01D4A2 ‹ð’¢›  GC=Lu    MATHEMATICAL SCRIPT CAPITAL G
     185 U+01D49C ‹ð’œ›  GC=Lu    MATHEMATICAL SCRIPT CAPITAL A
     178 U+01D53C ‹ð”¼›  GC=Lu    MATHEMATICAL DOUBLE-STRUCK CAPITAL E
     137 U+01D4AA ‹ð’ª›  GC=Lu    MATHEMATICAL SCRIPT CAPITAL O
      56 U+01D4A5 ‹ð’¥›  GC=Lu    MATHEMATICAL SCRIPT CAPITAL J
      48 U+01D4A6 ‹ð’¦›  GC=Lu    MATHEMATICAL SCRIPT CAPITAL K
      44 U+01D4B1 ‹ð’±›  GC=Lu    MATHEMATICAL SCRIPT CAPITAL V
      43 U+01D4B2 ‹ð’²›  GC=Lu    MATHEMATICAL SCRIPT CAPITAL W
      42 U+01D4B4 ‹ð’´›  GC=Lu    MATHEMATICAL SCRIPT CAPITAL Y
      41 U+01D4B5 ‹ð’µ›  GC=Lu    MATHEMATICAL SCRIPT CAPITAL Z
      35 U+01D4B0 ‹ð’°›  GC=Lu    MATHEMATICAL SCRIPT CAPITAL U
      30 U+01D4AC ‹ð’¬›  GC=Lu    MATHEMATICAL SCRIPT CAPITAL Q
      23 U+01D54A ‹ð•Š›  GC=Lu    MATHEMATICAL DOUBLE-STRUCK CAPITAL S
      21 U+01D539 ‹ð”¹›  GC=Lu    MATHEMATICAL DOUBLE-STRUCK CAPITAL B
      19 U+01D5A7 ‹ð–§›  GC=Lu    MATHEMATICAL SANS-SERIF CAPITAL H
      18 U+01D517 ‹ð”—›  GC=Lu    MATHEMATICAL FRAKTUR CAPITAL T
      15 U+01D4C3 ‹ð“ƒ›  GC=Ll    MATHEMATICAL SCRIPT SMALL N
      14 U+01D535 ‹ð”µ›  GC=Ll    MATHEMATICAL FRAKTUR SMALL X
      13 U+01D4BF ‹ð’¿›  GC=Ll    MATHEMATICAL SCRIPT SMALL J
      11 U+01D540 ‹ð•€›  GC=Lu    MATHEMATICAL DOUBLE-STRUCK CAPITAL I
       9 U+01D465 ‹ð‘¥›  GC=Ll    MATHEMATICAL ITALIC SMALL X
       9 U+01D4CE ‹ð“Ž›  GC=Ll    MATHEMATICAL SCRIPT SMALL Y
       9 U+01D538 ‹ð”¸›  GC=Lu    MATHEMATICAL DOUBLE-STRUCK CAPITAL A
       8 U+01D4C2 ‹ð“‚›  GC=Ll    MATHEMATICAL SCRIPT SMALL M
       8 U+01D54D ‹ð•›  GC=Lu    MATHEMATICAL DOUBLE-STRUCK CAPITAL V
       7 U+01D4B6 ‹ð’¶›  GC=Ll    MATHEMATICAL SCRIPT SMALL A
       7 U+01D4BE ‹ð’¾›  GC=Ll    MATHEMATICAL SCRIPT SMALL I
       7 U+01D4CC ‹ð“Œ›  GC=Ll    MATHEMATICAL SCRIPT SMALL W
       7 U+01D516 ‹ð”–›  GC=Lu    MATHEMATICAL FRAKTUR CAPITAL S
       7 U+01D4BE ‹ð’¾›  GC=Ll    MATHEMATICAL SCRIPT SMALL I
       7 U+01D4CC ‹ð“Œ›  GC=Ll    MATHEMATICAL SCRIPT SMALL W
       7 U+01D516 ‹ð”–›  GC=Lu    MATHEMATICAL FRAKTUR CAPITAL S
       4 U+01D4CF ‹ð“›  GC=Ll    MATHEMATICAL SCRIPT SMALL Z
       4 U+01D53B ‹ð”»›  GC=Lu    MATHEMATICAL DOUBLE-STRUCK CAPITAL D
       4 U+01D54B ‹ð•‹›  GC=Lu    MATHEMATICAL DOUBLE-STRUCK CAPITAL T
       3 U+01D4BB ‹ð’»›  GC=Ll    MATHEMATICAL SCRIPT SMALL F
       3 U+01D4CA ‹ð“Š›  GC=Ll    MATHEMATICAL SCRIPT SMALL U
       3 U+01D507 ‹ð”‡›  GC=Lu    MATHEMATICAL FRAKTUR CAPITAL D
       3 U+01D542 ‹ð•‚›  GC=Lu    MATHEMATICAL DOUBLE-STRUCK CAPITAL K
       3 U+01D546 ‹ð•†›  GC=Lu    MATHEMATICAL DOUBLE-STRUCK CAPITAL O
       2 U+01D4BD ‹ð’½›  GC=Ll    MATHEMATICAL SCRIPT SMALL H
       2 U+01D4C5 ‹ð“…›  GC=Ll    MATHEMATICAL SCRIPT SMALL P
       2 U+01D505 ‹ð”…›  GC=Lu    MATHEMATICAL FRAKTUR CAPITAL B
       2 U+01D50E ‹ð”Ž›  GC=Lu    MATHEMATICAL FRAKTUR CAPITAL K
       2 U+01D541 ‹ð•›  GC=Lu    MATHEMATICAL DOUBLE-STRUCK CAPITAL J
       2 U+01D543 ‹ð•ƒ›  GC=Lu    MATHEMATICAL DOUBLE-STRUCK CAPITAL L
       2 U+100002 ‹ô€€‚›  GC=Co    <private use character>
       1 U+01D4B8 ‹ð’¸›  GC=Ll    MATHEMATICAL SCRIPT SMALL C
       1 U+01D4C1 ‹ð“›  GC=Ll    MATHEMATICAL SCRIPT SMALL L
       1 U+01D53D ‹ð”½›  GC=Lu    MATHEMATICAL DOUBLE-STRUCK CAPITAL F
       1 U+01D53E ‹ð”¾›  GC=Lu    MATHEMATICAL DOUBLE-STRUCK CAPITAL G
       1 U+01D54C ‹ð•Œ›  GC=Lu    MATHEMATICAL DOUBLE-STRUCK CAPITAL U
       1 U+01D6A4 ‹ðš¤›  GC=Ll    MATHEMATICAL ITALIC SMALL DOTLESS I
       1 U+01D7D9 ‹ðŸ™›  GC=Nd    MATHEMATICAL DOUBLE-STRUCK DIGIT ONE

I really wish I knew what they were using U+100002 to do. :(

If those aren't showing up in your browser, you should install George Douros’s Symbola font or another mirror for dowload. It also has all the fun Unicode 6.0.0 code points in it, too.


For me, the Mathematical Alphanumeric Symbols that are used for math typesetting with OpenType fonts such as Cambria Math.