Why use bin2hex when inserting binary data from PHP into MySQL?
A hex string is significantly longer than the corresponding binary string. Simply the transfer time and the copying it around inside PHP's and MySQL's memory may do the trick.
In all honesty I'm not expert on the underlying implementation, but wouldn't it be better to not pass the data inside the SQL at all, but using e.g. PDOStatement
's parameter binding? Maybe someone more knowledgeable here can confirm whether that will indeed cause the data to be sent as a binary string, outside any SQL statement at all, or whether PDO just does the escaping and query string manipulation under the hood.
Either way, you get a security (and simplicity) benefit right there.
This sounds like an urban legend to me.
bin2hex()
maps each byte in the input to two bytes in the output ('a'
-> '61'
), so you should notice a significant memory increase of the script performing the query - it should use at least as much memory more as the byte length of the binary data to be inserted.
Furthermore, this implies that running bin2hex()
on a long string takes much longer than running mysql_real_escape string()
, which - as explained in MySQL's documentation - just escapes 6 characters: NULL
, \r
, \n
, \
, ,
and 'Control-Z'.
That was for the PHP part, now for MySQL: The server needs to do the reverse operation to store the data correctly. Reversing either of the functions takes almost as long as the original operation - the reverse function of mysql_real_escape_string()
needs to replace escaped values (\\
) with unescaped ones (\
), whereas the reverse of bin2hex()
would need to replace each and every byte tuple with a new byte.
Since calling mysql_real_escape_string()
on binary data is safe (according to MySQL's and PHP's documentation or even when just considering that the operation does not do any other conversions than the ones listed above), it would make absolutely no sense to perform such a costly operation.
I've been testing this myself, and I've come up with pretty consistent results. (Even though my tests are a tad crude.)
I've tested three computers
- Windows 7 (x64), PHP 5.3, MySQL 5.1
- Ubuntu 9.10 (x64) PHP 5.2, MySQL 5.1
- Ubuntu 10.04 (x32) PHP 5.3, MySQL 5.1
So far the tests on all three platforms have indicated the same tings:
- Inserting into a BLOB is 2x to 8x faster on MyISAM than on InnoDB. The difference seems to be higher on binary strings than HEX-coded strings. (See the data below)
- Using a HEX coded string (
bin2hex
into aX'...'
) uses more memory, on average, than using an escaped binary string (mysql_real_escape_string
on the raw data). - This seems true for both MyISAM and InnoDB. - The binary string is faster on MyISAM, but the HEX-coded data is faster on InnoDB.
The test was basically just a simple loop that escaped or hex-coded the raw data (a 2.4 MiB image retrieved once at the top of the script), constructed the query string, and executed it via the mysql_query
or mysqli::query
functions. - I tested with both extensions. Didn't seem to be any difference.
I put the results from the Ubuntu 10.04 (#3) up in spreadsheets. The results from the Ubuntu 9.10 (#2) machine were pretty much the same, so I didn't bother set them up:
(Finally an excuse to test the Google Docs thing properly! xD)
- Binary string on MyISAM
- HEX-coded string on MyISAM
- Binary string on InnoDB
- Hex-coded string on InnoDB
These graphs show the private memory usage by the mysqld
process on the Win7 (#1) machine.
- Binary string on MyISAM
- HEX-coded string on MyISAM
- Binary string on InnoDB
- HEX-coded string on InnoDB