Redis autocomplete
If you're dealing with a large data set, I would suggest considering implementing this as a trie. I have thrown together a small bit of Ruby that would do this:
require 'rubygems'
require 'redis'
class RedisTrie
TERMINAL = '+'
def initialize(prefix)
@prefix = prefix
@r = Redis.new
end
def add_word(word)
w = word.gsub(/[^a-zA-Z0-9_-]/, '')
key = "#{@prefix}:"
w.each_char do |c|
@r.zset_add key, c.bytes.first, c
key += c
end
@r.zset_add key, 0, TERMINAL
end
def add_words(*words)
words.flatten.compact.each {|word| add_word word}
end
def suggest(text)
@r.zset_range("#{@prefix}:#{text}", 0, -1).map do |c|
(c == TERMINAL) ? text : suggest(text + c)
end.flatten
end
end
rt = RedisTrie.new('trie')
rt.add_words %w( apple automobile carwash oil-change cranky five ruthie axe auto )
p rt.suggest(ARGV.shift.to_s)
For example:
$ ruby RedisTrie.rb
["apple", "auto", "automobile", "axe", "carwash", "cranky", "five", "oil-change", "ruthie"]
$ ruby RedisTrie.rb a
["apple", "auto", "automobile", "axe"]
$ ruby RedisTrie.rb au
["auto", "automobile"]
$ ruby RedisTrie.rb aux
[]
Read more on Tries at Wikipedia's entry on Tries.
You will definitely want to optimize your suggest method to not return ALL values, instead only returning the first X values it finds. It would defeat the purpose to iterate the entire data structure.
Here's a dead simple algorithm in PHP for alphabetical autocomplete with redis:
function getNextChar($char) {
$char++;
if(strlen($char) > 1) { $char--; }
return $char;
}
function createDictionary($redis, $key, $wordList) {
if(!$redis->exists($key)) {
foreach($wordList as $word) {
$redis->zadd($key, 0, $word);
}
}
}
function getLexicalAutocomplete($redis, $dictionaryKey, $input) {
$inputNext = substr($input, 0, -1) . getNextChar(substr($input, -1)); //ab -> ac
$redis->zadd($dictionaryKey, 0, $input);
$redis->zadd($dictionaryKey, 0, $inputNext);
$rangeStart = $redis->zrank($dictionaryKey, $input)+1;
$rangeEnd = $redis->zrank($dictionaryKey, $inputNext)-1;
$autocompleteResults = $redis->zrange($dictionaryKey, $rangeStart, $rangeEnd);
$redis->zrem($dictionaryKey, $input);
$redis->zrem($dictionaryKey, $inputNext);
return $autocompleteResults;
}
$redis = new Redis();
$redis->connect('', 0); //Your redis server ip/port goes here
createDictionary($redis, "dict", array("alfred", "joel", "jeff", "addick"));
$result = getLexicalAutocomplete($redis, "dict", $argv[1]);
echo json_encode($result);
Based on the article Auto Complete with Redis by Salvatore, except I hax the need to generate an additional autocomplete dictionary, at the expense of a tiny little bit of performance penalty (a couple of zadds and zrems extra) but in most cases it should perform well. The script assumes phpredis, but it should practically be the same with predis.
Output examples:
> php redisauto.php a
["addick","alfred"]
> php redisauto.php ad
["addick"]
> php redisauto.php al
["alfred"]
> php redisauto.php j
["jeff","joel"]
> php redisauto.php je
["jeff"]
I also found this snippet when reading Simon Willison's impressive Redis tutorial.
Solution:
Hello Max,
KEYS is not the way to go, the best thing you can do is to use instead a sorted set. What you want is to turn the first 4 or 5 characters of the strings into an integer (you can imagine every char as a digit of a radix 256 number for instance, but there are better representation) and add all your usernames into a sorted set.
Then using ZRANGEBYSCORE you can get all the elements between a given range.
This method is much more scalable as it's an O(log(N)) thing.
I'm covering this stuff in my very slowly evolving Redis book...
Cheers, Salvatore
Here is a port of original antirez's Ruby implementation in Python:
http://www.varunpant.com/posts/auto-complete-with-redis-python