How can I get MIME type of an InputStream of a file that is being uploaded?
I'm a big proponent of "do it yourself first, then look for a library solution". Luckily, this case is just that.
You have to know the file's "magic number", i.e. its signature.
Let me give an example for detecting whether the InputStream
represents PNG file.
PNG signature is composed by appending together the following in HEX:
1) error-checking byte
2) string "PNG" as in ASCII:
P - 0x50
N - 0x4E
G - 0x47
3) CR
(carriage return) - 0x0D
4) LF
(line feed) - 0xA
5) SUB
(substitute) - 0x1A
6) LF
(line feed) - 0xA
So, the magic number is
89 50 4E 47 0D 0A 1A 0A
137 80 78 71 13 10 26 10 (decimal)
-119 80 78 71 13 10 26 10 (in Java)
Explanation of 137 -> -119
conversion
N bit number can be used to represent 2^N
different values.
For a byte (8
bits) that is 2^8=256
, or 0..255
range.
Java considers byte primitives to be signed, so that range is -128..127
.
Thus, 137
is considered to be singed and represent -119 = 137 - 256
.
Example in Koltin
private fun InputStream.isPng(): Boolean {
val magicNumbers = intArrayOf(-119, 80, 78, 71, 13, 10, 26, 10)
val signatureBytes = ByteArray(magicNumbers.size)
read(signatureBytes, 0, signatureBytes.size)
return signatureBytes.map { it.toInt() }.toIntArray().contentEquals(magicNumbers)
}
Of course, in order to support many MIME types, you have to scale this solution somehow, and if you are not happy with the result, consider some library.
I wrote my own content-type detector for a byte[] because the libraries above weren't suitable or I didn't have access to them. Hopefully this helps someone out.
// retrieve file as byte[]
byte[] b = odHit.retrieve( "" );
// copy top 32 bytes and pass to the guessMimeType(byte[]) funciton
byte[] topOfStream = new byte[32];
System.arraycopy(b, 0, topOfStream, 0, topOfStream.length);
String mimeGuess = guessMimeType(topOfStream);
...
private static String guessMimeType(byte[] topOfStream) {
String mimeType = null;
Properties magicmimes = new Properties();
FileInputStream in = null;
// Read in the magicmimes.properties file (e.g. of file listed below)
try {
in = new FileInputStream( "magicmimes.properties" );
magicmimes.load(in);
in.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
// loop over each file signature, if a match is found, return mime type
for ( Enumeration keys = magicmimes.keys(); keys.hasMoreElements(); ) {
String key = (String) keys.nextElement();
byte[] sample = new byte[key.length()];
System.arraycopy(topOfStream, 0, sample, 0, sample.length);
if( key.equals( new String(sample) )){
mimeType = magicmimes.getProperty(key);
System.out.println("Mime Found! "+ mimeType);
break;
} else {
System.out.println("trying "+key+" == "+new String(sample));
}
}
return mimeType;
}
magicmimes.properties file example (not sure these signatures are correct, but they worked for my uses)
# SignatureKey content/type
\u0000\u201E\u00f1\u00d9 text/plain
\u0025\u0050\u0044\u0046 application/pdf
%PDF application/pdf
\u0042\u004d image/bmp
GIF8 image/gif
\u0047\u0049\u0046\u0038 image/gif
\u0049\u0049\u004D\u004D image/tiff
\u0089\u0050\u004e\u0047 image/png
\u00ff\u00d8\u00ff\u00e0 image/jpg
It depends on where you are getting the input stream from. If you are getting it from a servlet then it is accessable through the HttpServerRequest object that is an argument of doPost. If you are using some sort of rest API like Jersey then the request can be injected by using @Context. If you are uploading the file through a socket it will be your responsibility to specify the MIME type as part of your protocol as you will not inherit the http headers.
According to Real Gagnon's excellent site, the better solution for your case would be to use Apache Tika.