How to generate strings that share the same hashcode in Java?
I think find a equal-hash string from a long string is too hard, it's easy when find equal-hash string of an short string (2 or 3). Look at the equation below. (sorry I cant post image cause me new member)
Notice that, "FB" and "Ea" have the same hashcode, and any two strings like s1+"FB"+s2 and s1+"Ea"+s2 will have the same hashcode. So, the easy solution is finding any 2-char substring of existing string and replace with a 2-char substring with the same hashcode
Exmaple, we have the string "helloworld" get 2-char substring "he", hashcode("he") = 'h'*31 + 'e' = ('h'*31 + 31) + ('e' - 31) = ('h'+1)*31 + 'F' = 'i' + 'F' = hashcode("iF") so the desire string is "iFlloworld" we have increased 'h' by 1, we can increase by 2, or 3 etc (but will be wrong if it overflow the char value)
The below code run well with small level, it will wrong if the level is big, make the char value overflow, I will fix it later if you want (this code change 2 first chars, but I will edit code to 2 last chars because 2 first chars are calc with largest value)
public static String samehash(String s, int level) {
if (s.length() < 2)
return s;
String sub2 = s.substring(0, 2);
char c0 = sub2.charAt(0);
char c1 = sub2.charAt(1);
c0 = (char) (c0 + level);
c1 = (char) (c1 - 31 * level);
String newsub2 = new String(new char[] { c0, c1 });
String re = newsub2 + s.substring(2);
return re;
}
see a test method, basically, so long as you match, a1*31+b1 = a2*31 +b2, which means (a1-a2)*31=b2-b1
public void testHash()
{
System.out.println("A:" + ((int)'A'));
System.out.println("B:" + ((int)'B'));
System.out.println("a:" + ((int)'a'));
System.out.println(hash("Aa".hashCode()));
System.out.println(hash("BB".hashCode()));
System.out.println(hash("Aa".hashCode()));
System.out.println(hash("BB".hashCode()));
System.out.println(hash("AaAa".hashCode()));
System.out.println(hash("BBBB".hashCode()));
System.out.println(hash("AaBB".hashCode()));
System.out.println(hash("BBAa".hashCode()));
}
you will get
A:65
B:66
a:97
2260
2260
2260
2260
2019172
2019172
2019172
2019172
edit: someone said this is not straightforward enough. I added below part
@Test
public void testN() throws Exception {
List<String> l = HashCUtil.generateN(3);
for(int i = 0; i < l.size(); ++i){
System.out.println(l.get(i) + "---" + l.get(i).hashCode());
}
}
AaAaAa---1952508096
AaAaBB---1952508096
AaBBAa---1952508096
AaBBBB---1952508096
BBAaAa---1952508096
BBAaBB---1952508096
BBBBAa---1952508096
BBBBBB---1952508096
below is the source code, it might be not efficient, but it work:
public class HashCUtil {
private static String[] base = new String[] {"Aa", "BB"};
public static List<String> generateN(int n)
{
if(n <= 0)
{
return null;
}
List<String> list = generateOne(null);
for(int i = 1; i < n; ++i)
{
list = generateOne(list);
}
return list;
}
public static List<String> generateOne(List<String> strList)
{
if((null == strList) || (0 == strList.size()))
{
strList = new ArrayList<String>();
for(int i = 0; i < base.length; ++i)
{
strList.add(base[i]);
}
return strList;
}
List<String> result = new ArrayList<String>();
for(int i = 0; i < base.length; ++i)
{
for(String str: strList)
{
result.add(base[i] + str);
}
}
return result;
}
}
look at String.hashCode()
public int hashCode() {
int h = hash;
if (h == 0) {
int off = offset;
char val[] = value;
int len = count;
for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
hash = h;
}
return h;
}
I was wondering if there was a "universal" solution; e.g. some constant string XYZ
, such that
s.hashCode() == (s + XYZ).hashCode()
for any string s
. Finding such a string involves solving a fairly complicated equation ... which was beyond my rusty mathematical ability. But then it dawned on me that h == 31*h + ch
is always true
when h
and ch
are both zero!
Based on that insight, the following method should create a different String with the same hashcode as its argument:
public String collider(String s) {
return "\0" + s;
}
If NUL characters are problematic for you, prepending any string whose hashcode is zero would work too ... albeit that the colliding strings would be longer than if you used zero.