Search with diacritics / accents characters with `locate` command
If we take a look at updatedb.conf(5)
, we'll find that there is no much we can do with configuration items.
So we are going to write a script using locate
; At the end we are able to run something like my-locate.sh liberacion
or my-locate.sh liberâciòn
and it will brings us all the possible combinations.
Lets start
First create a simple file as our database anywhere you want it to be, e.g: ~/.mydb
; then add your accents characters into that file like this:
aâàáäÂÀÂÄ
eêèéëÊÈÉË
iîïíÎÏ
uûùüÛÜÙ
cçÇ
oôöóÔÖóòòò
...
...
Then we need a small script which does the job for us, I wrote a simple one:
#!/bin/bash
# Final search term
STR=""
# Loop throughout all characters of desired string
for (( i=0; i<${#1}; i++ )); do
# Split the string in one char
CH="${1:$i:1}"
# Find all possible combinations of this char
CHARS=$(grep "$CH" ~/.mydb)
# Add an "or" operator between characters
REG=$(echo "$CHARS" | sed 's/.\{1\}/&\|/g' )
REG="($REG)"
# Append all possible combination of this character
# to our final search term as an or statement
if [ "$REG" == '()' ];
then
STR=$STR$CH
else
STR=$STR$REG
fi
done
# locate it using regex
locate --regex "$STR$"
Now save it somewhere in your PATH with a desired name, e.g: in ~/bin
. It should be already in your PATH environment.
After all simply use something like this to search all possible combinations.
my-locate.sh liberacion
Will find for me all of these:
~/lab/liberacion
~/lab/liberaciòn
~/lab/liberación
~/lab/liberâciòn
~/lab/liberäciòn
~/lab/libÈrâciòn
Now with mlocate 0.26 we have -t --transliterate
option (see the man page) on Ubuntu 18.04+ (without the need of workarounds):
Creating some test files:
$ touch liberación liberacion liberaciôn
Update and search:
$ updatedb
$ locate --transliterate liberacion
/home/pablo/liberacion
/home/pablo/liberación
/home/pablo/liberaciôn
So now locate -t liberación
also search for files with string liberacion
and even liberaciòn
!
Finally, creating an alias on my .bashrc
:-)
$ alias locate="locate --transliterate"