Is it a coincidence that the first 4 bytes of a PGP/GPG file are ellipsis, smile, female sign and a heart?
Yes, it's a coincidence that the first bytes appear to you as these symbols. They are part of the OpenPGP message format specification (RFC 4880) and vary depending on the packet properties.
Let's create a file containing only those bytes and try to read it as a GPG message:
$ echo "\x85\x02\x0c\x03" > foo.gpg && gpg --list-packets foo.gpg # off=0 ctb=85 tag=1 hlen=3 plen=524 :pubkey enc packet: version 3, algo 255, keyid 0AFFFFFFFFFFFFFF unsupported algorithm 255
The first byte (
0x85 = 0b10000101
) is the cipher type byte (CTB) that describes the packet type. We can break it up as follows:1
: CTB indicator bit0
: old packet format (see RFC 1991)0001
: public-key-encrypted packet01
: packet-length field is 2 bytes longThe second and third bytes denote the packet length (
0x020c = 524
).- The fourth byte (
0x03
) means it's in the version 3 packet format.
As you can see, these bytes are meaningful and not magic number constants that you can remove without losing information. If you cut them off, you are corrupting the GPG packet and it will require some guesswork to reconstruct it.
The bytes are shown as smileys and hearts because that's how your (probably DOS) terminal displays non-printable control characters. In character sets that originate from code page 437, low bytes outside the printable ASCII range are traditionally represented as icons. Here's the original CP437
on an IBM PC:
(Image source)
As a general principle, well-designed binary file formats¹ will have their first few bytes be a magic number identifying the format. ELF executables' first four bytes are always 7f 45 4c 46, PNG files' first eight bytes are always 89 50 4e 47 0d 0a 1a 0a, and so on. Well-designed encrypted file formats will always follow that magic number with an unencrypted "header" that reveals the encryption algorithm, the length of the encrypted data, things like that.
This is not normally considered a security vulnerability, because of Kerckhoffs' principle, which says that a cryptosystem needs to be secure even if the attacker knows everything that the file header can tell them (such as the algorithm).
It's possible to design a file format, or a protocol, all of whose bytes are indistinguishable from randomness unless you already know the decryption key, but it's surprisingly difficult (did you know that encrypting the expected length of encrypted data can introduce a vulnerability?) and it doesn't actually gain you anything. A file that's completely indistinguishable from the output of cat /dev/random
will be just as suspicious to the secret police as an obviously GPG-encrypted file. Perhaps more suspicious, even, since there are all kinds of innocuous reasons to encrypt files.
If you are worried about an attacker merely learning that you are using encryption to communicate with someone, you need steganography, which conceals secret information within ordinary-looking, unencrypted files. Be aware that the state of the art in steganography is not nearly as sophisticated as the state of the art in cryptography; last I checked, all known approaches were breakable by a determined adversary. (If the secret police's first impression is "oh, this is a memory card full of vacation photos", they might not bother digging any deeper…unless they already have a reason to suspect you.)
¹ I have no opinion about whether the GPG file format is well-designed.