How are file types known if not from file suffix?

The file utility determines the filetype over 3 ways:

First the filesystem tests: Within those tests one of the stat family system calls is invoked on the file. This returns the different unix file types: regular file, directory, link, character device, block device, named pipe or a socket. Depending on that, the magic tests are made.

The magic tests are a bit more complex. File types are guessed by a database of patterns called the magic file. Some file types can be determined by reading a bit or number in a particular place within the file (binaries for example). The magic file contains "magic numbers" to test the file whether it contains them or not and which text info should be printed. Those "magic numbers" can be 1-4Byte values, strings, dates or even regular expressions. With further tests additional information can be found. In case of an executable, additional information would be whether it's dynamically linked or not, stripped or not or the architecture. Sometimes multiple tests must pass before the file type can be truly identified. But anyway, it doesn't matter how many tests are performed, it's always just a good guess.

Here are the first 8 bytes in a file of some common filetypes which can help us to get a feeling of what these magic numbers can look like:

             Hexadecimal          ASCII
PNG   89 50 4E 47|0D 0A 1A 0A   ‰PNG|....
JPG   FF D8 FF E1|1D 16 45 78   ÿØÿá|..Ex
JPG   FF D8 FF E0|00 10 4A 46   ÿØÿà|..JF
ZIP   50 4B 03 04|0A 00 00 00   PK..|....
PDF   25 50 44 46|2D 31 2E 35   %PDF|-1.5

If the file type can't be found over magic tests, the file seems to be a text file and file looks for the encoding of the contents. The encoding is distinguished by the different ranges and sequences of bytes that constitute printable text in each set.

The line breaks are also investigated, depending on their HEX values:

  • 0A (\n) classifies a Un*x/Linux/BSD/OSX terminated file
  • 0D 0A (\r\n) are file from Microsoft operating systems
  • 0D (\r) would be Mac OS until version 9
  • 15 (\025) would be IBMs AIX

Now the language tests start. If it appears to be a text file, the file is searched for particular strings to find out which language it contains (C, Perl, Bash). Some script languages can also be identified over the hashbang (#!/bin/interpreter) in the first line of the script.

If nothing applies to the file, the file type can't be determined and file just prints "data".

So, you see there is no need for a suffix. A suffix anyway could confuse, if set wrong.


Often, it doesn't care. You just pass it to a program and either it interprets it or it doesn't. It may not be useful to open a .jpg in a text editor, but you're not prevented from doing this. The extension, like the rest of the filename, is for the organisational convenience of humans.

It may also be possible to construct files that can be validly interpreted in multiple ways. Because the ZIP file format starts has a header at the end of the file, you can prepend other things to the front and it will still load as a ZIP file. This is commonly used to make self-extracting zip files.


That information is commonly found in the header of the file. The file command analyzes the target and tells you information about the file. A lot of information is often derived from file headers which are often times the first few bytes of a file (see below). Headers are used by the system to figure out how to handle files. #!/bin/bash at the beginning of a file tells the system to use the bash shell to interpret the following script. ELF tells the system that this is an ELF executable.

[~] root@www # file /bin/ls
/bin/ls: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped

[~] root@www # file /etc/passwd
/etc/passwd: ASCII text

File header examples:

[root@server4 ~]# xxd old_sm_logo.png | head -5
0000000: 8950 4e47 0d0a 1a0a 0000 000d 4948 4452  .PNG........IHDR
0000010: 0000 0134 0000 006f 0806 0000 0062 bf3c  ...4...o.....b.<

[root@server4 ~]# xxd /bin/ls | head -5
0000000: 7f45 4c46 0201 0100 0000 0000 0000 0000  .ELF............
0000010: 0200 3e00 0100 0000 a024 4000 0000 0000  ..>......$@.....

[root@server4 proj]# xxd resizer.sh | head -5
0000000: 2321 2f62 696e 2f62 6173 680a 5b20 2d7a  #!/bin/bash.[ -z
0000010: 2022 2431 2220 5d20 2626 2065 6368 6f20   "$1" ] && echo