Using Python, find anagrams for a list of words
One solution is to sort the word you're searching anagrams for (for example using sorted
), sort the alternative and compare those.
So if you would be searching for anagrams of 'rac' in the list ['car', 'girl', 'tofu', 'rca']
, your code could look like this:
word = sorted('rac')
alternatives = ['car', 'girl', 'tofu', 'rca']
for alt in alternatives:
if word == sorted(alt):
print alt
Create a dictionary of (sorted word, list of word). All the words that are in the same list are anagrams of each other.
from collections import defaultdict
def load_words(filename='/usr/share/dict/american-english'):
with open(filename) as f:
for word in f:
yield word.rstrip()
def get_anagrams(source):
d = defaultdict(list)
for word in source:
key = "".join(sorted(word))
d[key].append(word)
return d
def print_anagrams(word_source):
d = get_anagrams(word_source)
for key, anagrams in d.iteritems():
if len(anagrams) > 1:
print(key, anagrams)
word_source = load_words()
print_anagrams(word_source)
Or:
word_source = ["car", "tree", "boy", "girl", "arc"]
print_anagrams(word_source)
In order to do this for 2 strings you can do this:
def isAnagram(str1, str2):
str1_list = list(str1)
str1_list.sort()
str2_list = list(str2)
str2_list.sort()
return (str1_list == str2_list)
As for the iteration on the list, it is pretty straight forward
There are multiple solutions to this problem:
Classic approach
First, let's consider what defines an anagram: two words are anagrams of each other if they consist of the same set of letters and each letter appears exactly the same number or time in both words. This is basically a histogram of letters count of each word. This is a perfect use case for
collections.Counter
data structure (see docs). The algorithms is as follows:- Build a dictionary where keys would be histograms and values would be lists of words that have this histogram.
- For each word build it's histogram and add it to the list that corresponds to this histogram.
- Output list of dictionary values.
Here is the code:
from collections import Counter, defaultdict def anagram(words): anagrams = defaultdict(list) for word in words: histogram = tuple(Counter(word).items()) # build a hashable histogram anagrams[histogram].append(word) return list(anagrams.values()) keywords = ("hi", "hello", "bye", "helol", "abc", "cab", "bac", "silenced", "licensed", "declines") print(anagram(keywords))
Note that constructing
Counter
isO(l)
, while sorting each word isO(n*log(l))
where l is the length of the word.Solving anagrams using prime numbers
This is a more advanced solution, that relies on the "multiplicative uniqueness" of prime numbers. You can refer to this SO post: Comparing anagrams using prime numbers, and here is a sample python implementation.