How does "cut and paste" affect character encoding and what can go wrong?

I observed the following behavior when I looked into Unicode normalization: When copying a canonically decomposed string (NFD) in Firefox in macOS 10.15.7, the string is normalized to NFC when pasting it in Chrome. What's weird is that the pasting affects the content of the clipboard: When pasting the string in Firefox again, it's then also canonically composed there. If I don't paste it anywhere else before pasting it in Firefox again, the NFD form survives. Interestingly, the problem doesn't occur in the other direction: When copying a canonically decomposed string in Chrome, it's pasted in NFD form anywhere I can tell. My conclusion is that Firefox stores text to the clipboard differently from other applications. One way to play around with this yourself is to copy 'mañana' === 'mañana' to your JavaScript console. The statement returns false if the NFD form of the string on the right survived the copy & paste.


First of all, a text editor's internal representation of text has no bearing on how the text is encoded (serialized) when you save the file. So a document is not "in" an encoding; it's a sequence of abstract characters. When the document is saved to a file (or transmitted over the network) then it gets encoded.

It's up to each application to decide what it puts on the clipboard. Typically, a windows app that knows what it's doing will put a number of different representations on the clipboard. When you paste in the other app, the app will look for the representation that best suits its need.

In your case, a text editor (that knows what it's doing) will put a Unicode representation of a selected string onto the clipboard (where Unicode, in Windows, is typically moved around as UTF-16, but that's not important). When you paste in the other app, it will insert that sequence of Unicode characters into the document at the selection point.

There's an app floating around called "ClipSpy" that will help you see what I'm talking about, interactively.