Using COLLATE in Android SQLite - Locales is ignored in LIKE statement
In Android sqlite, LIKE
and GLOB
ignore both COLLATE LOCALIZED
and COLLATE UNICODE
(they only work for ORDER BY
). However, as @asat explains in his answer, you can use GLOB
with a pattern that will replace each letter with all the available alternatives of that letter. In Java:
public static String addTildeOptions(String searchText) {
return searchText.toLowerCase()
.replaceAll("[aáàäâã]", "\\[aáàäâã\\]")
.replaceAll("[eéèëê]", "\\[eéèëê\\]")
.replaceAll("[iíìî]", "\\[iíìî\\]")
.replaceAll("[oóòöôõ]", "\\[oóòöôõ\\]")
.replaceAll("[uúùüû]", "\\[uúùüû\\]")
.replace("*", "[*]")
.replace("?", "[?]");
}
And then (not literally like this, of course):
SELECT * from table WHERE lower(column) GLOB "*addTildeOptions(searchText)*"
This way, for example in Spanish, a user searching for either mas or más will get the search converted into m[aáàäâã]s, returning both results.
It is important to notice that GLOB
ignores COLLATE NOCASE
, that's why I converted everything to lower case both in the function and in the query. Notice also that the lower()
function in sqlite doesn't work on non-ASCII characters - but again those are probably the ones that you are already replacing!
The function also replaces both GLOB
wildcards, *
and ?
, with "escaped" versions.
Have you had a look at the SQLite documentation for LIKE? It has come information about non ASCII characters and a bug. Maybe Android has an older version of SQLite installed where this is a problem.
I think the second normalised column might be your best option unfortunately.
Just today i had exactly the same task as you had. And in my situation making additional shadow columns is not a case because i'm having to search more than one column. So i came to the solution like this, which is tested in real project. In my case i'm handling only lower case letters but you can extend the function with upper case letters as well.
db.setLocale(Locale("cz", "CZ"))
val query = "SELECT * FROM table WHERE name GLOB ${getExpr(str)} ORDER BY name COLLATE LOCALIZED ASC"
private fun getExpr(input: String) : String{
var expr = ""
for(lettter in input){
expr += when(lettter){
's','š' -> "[sš]"
'a','á' -> "[aá]"
'e','ě','é' -> "[eěé]"
'i','í' -> "[ií]"
'z','ž' -> "[zž]"
'c','č' -> "[cč]"
'y','ý' -> "[yý]"
'r','ř' -> "[rř]"
'u','ů','ú' -> "[uůú]"
'o','ó' -> "[oó]"
'n','ň' -> "[nň]"
'd','ď' -> "[dď]"
't','ť' -> "[tť]"
else -> lettter
}
}
return "'*${expr}*'"
}
Creating a second normalised column can be used to go around limitations (as mentioned briefly in other answers).
This means in practice that you have to create another (shadow) column of your first where the same data in a fixed case (e.g. all upper chars) is stored. Case insensitive queries (including like queries) can be made on this new column with search values in the same case.
If the first column "a" contains
AAA
aaa
Bbb
äää
ééé
The second column a_shadow would contain for the same rows
AAA
AAA
BBB
ÄÄÄ
ÉÉÉ
and your original query (example) "select a from mytable where a='äää'"
would be replaced with "select a from mytable where A='ÄÄÄ'"
Your code needs to be updated to fill the converted shadow content when adding the primary content. If the column is added after creation or you cannot change the code existing values may need to be converted using an update query. Example:
UPDATE mytable SET a_shadow=UPPER(a);