What is the unicode variation selector
The Unicode standard talks about this. Here's a bit of the relevant section from 3.2.0, annex 28 (I'm sure there are more recent versions around; this is the first I found):
Unicode characters can be represented by a wide variety of glyphs, as discussed in Chapter 2, General Structure in The Unicode Standard, Version 3.0. Occasionally the need arises in text processing to restrict or change the set of glyphs that are to be used to represent a character. Normally such changes are indicated by choice of font or style in rich-text documents. In special circumstances, such a variation from the normal range of appearance needs to be expressed side-by-side in the same document in plain-text contexts, where it is impossible or inconvenient to exchange formatted text. For example, in languages employing the Mongolian script, sometimes a specific variant range of glyphs is needed for a specific textual purpose for which the range of “generic” glyphs is considered inappropriate. The variation selectors are used when characters have essentially the same semantic.
Variation selectors provide a mechanism for specifying a restriction on the set of glyphs that are used to represent a particular character. They also provide a mechanism for specifying variants, such as for CJK Ideographs and Mongolian, that have essentially the same semantic but have substantially different ranges of glyphs. A variation sequence, which always consists of a base character followed by the variation selector, may be specified as part of the Unicode Standard. That sequence is referred to as a variant of the base character. The variation selector affects only the appearance of the base character,* and only in the variation sequences defined in this Standard. The variation selector is not used as a general code extension mechanism.
(It goes on...)
You may also be interested in the Standardized Variants (this time from 6.0.0).
This is not a complete answer to the question, but it's pertinent to Emojis and Variant Selectors:
The ❤ character (U+2764 code point) is a Unicode character from 1993.
But the ❤️ emoji is actually the ❤ (U+2764) character followed by the Variant Selector-16 (U+FE0F).
Why?
Exclusively speaking about Emojis (documentation):
VS15 and VS16 are reserved to determine whether or not a character should be displayed as an emoji. [...]
Emoji variation sequences contain VS16 (U+FE0F) for emoji-style (with color) or VS15 (U+FE0E) for text style (monochrome)
If there is a character (or symbol, glyph, etc...) that is intended to be also a emoji, the Variant Selector-16
will specify to the render, to renders it as Emoji. But if the same character is followed by the Variant Selector-15
, it will specify to the render, to renders it as just text. If no Variant Selector
is appended, than the default representation will depends on Unicode's specification. For Emoticons the default is Emoji. For other characters like ❤, the default is text...
Another example from Emoticons (Unicode_block)'s documentation:
Each emoticon has two variants:
U+FE0E (VARIATION SELECTOR-15) selects text presentation (e.g. ð︎ ð︎ ☹︎)
U+FE0F (VARIATION SELECTOR-16) selects emoji-style (e.g. ð️ ð️ ☹️).If there is no variation selector appended, the default is the emoji-style. Example:
U+1F610 (NEUTRAL FACE) ð
U+1F610 (NEUTRAL FACE), U+FE0E (VARIATION SELECTOR-15) ð︎
U+1F610 (NEUTRAL FACE), U+FE0F (VARIATION SELECTOR-16) ð️
Note: The VS15
and VS16
are not mandatory to a valid emoji. There are a lot of emoji without Variant Selectors.