mySQL: get hash value for each row?

Well I made a little script that could do excactly what you want, and maybe what others want... so here it goes...for PHP that is... first you have to make a list of columns of the table, then you make a "case when" statement for each column based on their type and put that in the concat_ws statement and then you hash it with sha1...i've used this method on very large tables (600000+ records) and the speed is quite good when selecting all records. also I think that it is faster to concat the required data in a concat_ws and explode it in php or whatever you are using, but that is just a hunch...

<?
$query= mysql_query("SHOW COLUMNS FROM $table", $linklive);
        while ($col = mysql_fetch_assoc($query)) {
            $columns[] = mysql_real_escape_string($col['Field']);
            if ($col['Key'] == 'PRI') {
                $key = mysql_real_escape_string($col['Field']);
            }
            $columnsinfo[$col['Field']] = $col;
        }
        $dates = array("date","datetime","time");
                    $int = array("int","decimal");
                    $implcols = array();
                    foreach($columns as $col){
                        if(in_array($columnsinfo[$col]['Type'], $dates)){
                            $implcols[] = "(CASE WHEN (UNIX_TIMESTAMP(`$col`)=0 || `$col` IS NULL) THEN '[$col EMPTY]' ELSE `$col` END)";
                        }else{
                            list($type, $rest) = explode("(",$columnsinfo[$col]['Type']);
                            if(in_array($columnsinfo[$col]['Type'], $dates)){
                                $implcols[] = "(CASE WHEN ( `$col`=0 || `$col` IS NULL ) THEN '[$col EMPTY]' ELSE `$col` END)";
                            }else{
                                $implcols[] = "(CASE WHEN ( `$col`='' || `$col` IS NULL ) THEN '[$col EMPTY]' ELSE `$col` END)";
                            }
                        }
                    }
                    $keyslive = array();
                    //echo "SELECT $key SHA1(CONCAT_WS('',".implode(",", $columns).")) as compare FROM $table"; exit;
                    $q = "SELECT $key as `key`, SHA1(CONCAT_WS('',".implode(", ",$implcols).")) as compare FROM $table";
    ?>

It's better to use concat_ws(). e.g. two adjacent column: 12,3 => 1,23 .

Sorry, this still has some problems. Think about the null value, empty string, string can contain ',', etc...

A program is required to generate the hash statement, which should replace null to specific value (for null-able columns), and also use the seldom used char/byte as separator.


There are problems with CONCAT, e.g. CONCAT('ab', 'c') vs CONCAT('a', 'bc'). Two different rows, but result is the same. You could use CONCAT_WS(';', 'ab', 'c') to get ab;c but in case of CONCAT_WS(';', ';', '') vs CONCAT_WS(';', '', ';') you still get the same result.

Also CONCAT(NULL, 'c') returns NULL.

I think the best way is to use QUOTE:

SELECT MD5(CONCAT(QUOTE(c1), QUOTE(c2), QUOTE(c3))) AS row_hash FROM t1;

Result of: select (concat(quote('a'), quote('bc'), quote('NULL'), quote(NULL), quote('\''), quote('')));

is: 'a''bc''NULL'NULL'\''''

Also, don't use GROUP_CONCAT() to get hash of table, it has limit: https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_group_concat_max_len

Instead, CHECKSUM TABLE might be better, but you can't skip columns with CHECKSUM TABLE https://dev.mysql.com/doc/refman/5.7/en/checksum-table.html


you could do something like

SELECT MD5(concat(field1, field2, field3, ...)) AS rowhash

but you can't get away from listing which fields you want, as concat(*) is not an option (syntax error).

Tags:

Mysql

Hash