mySQL: get hash value for each row?
Well I made a little script that could do excactly what you want, and maybe what others want... so here it goes...for PHP that is... first you have to make a list of columns of the table, then you make a "case when" statement for each column based on their type and put that in the concat_ws statement and then you hash it with sha1...i've used this method on very large tables (600000+ records) and the speed is quite good when selecting all records. also I think that it is faster to concat the required data in a concat_ws and explode it in php or whatever you are using, but that is just a hunch...
<?
$query= mysql_query("SHOW COLUMNS FROM $table", $linklive);
while ($col = mysql_fetch_assoc($query)) {
$columns[] = mysql_real_escape_string($col['Field']);
if ($col['Key'] == 'PRI') {
$key = mysql_real_escape_string($col['Field']);
}
$columnsinfo[$col['Field']] = $col;
}
$dates = array("date","datetime","time");
$int = array("int","decimal");
$implcols = array();
foreach($columns as $col){
if(in_array($columnsinfo[$col]['Type'], $dates)){
$implcols[] = "(CASE WHEN (UNIX_TIMESTAMP(`$col`)=0 || `$col` IS NULL) THEN '[$col EMPTY]' ELSE `$col` END)";
}else{
list($type, $rest) = explode("(",$columnsinfo[$col]['Type']);
if(in_array($columnsinfo[$col]['Type'], $dates)){
$implcols[] = "(CASE WHEN ( `$col`=0 || `$col` IS NULL ) THEN '[$col EMPTY]' ELSE `$col` END)";
}else{
$implcols[] = "(CASE WHEN ( `$col`='' || `$col` IS NULL ) THEN '[$col EMPTY]' ELSE `$col` END)";
}
}
}
$keyslive = array();
//echo "SELECT $key SHA1(CONCAT_WS('',".implode(",", $columns).")) as compare FROM $table"; exit;
$q = "SELECT $key as `key`, SHA1(CONCAT_WS('',".implode(", ",$implcols).")) as compare FROM $table";
?>
It's better to use concat_ws(). e.g. two adjacent column: 12,3 => 1,23 .
Sorry, this still has some problems. Think about the null value, empty string, string can contain ',', etc...
A program is required to generate the hash statement, which should replace null to specific value (for null-able columns), and also use the seldom used char/byte as separator.
There are problems with CONCAT
, e.g. CONCAT('ab', 'c')
vs CONCAT('a', 'bc')
. Two different rows, but result is the same. You could use CONCAT_WS(';', 'ab', 'c')
to get ab;c
but in case of CONCAT_WS(';', ';', '')
vs CONCAT_WS(';', '', ';')
you still get the same result.
Also CONCAT(NULL, 'c')
returns NULL
.
I think the best way is to use QUOTE
:
SELECT MD5(CONCAT(QUOTE(c1), QUOTE(c2), QUOTE(c3))) AS row_hash FROM t1;
Result of: select (concat(quote('a'), quote('bc'), quote('NULL'), quote(NULL), quote('\''), quote('')));
is: 'a''bc''NULL'NULL'\''''
Also, don't use GROUP_CONCAT() to get hash of table, it has limit: https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_group_concat_max_len
Instead, CHECKSUM TABLE
might be better, but you can't skip columns with CHECKSUM TABLE
https://dev.mysql.com/doc/refman/5.7/en/checksum-table.html
you could do something like
SELECT MD5(concat(field1, field2, field3, ...)) AS rowhash
but you can't get away from listing which fields you want, as concat(*)
is not an option (syntax error).