I have always been fascinated by the way mobile phones complete words for you when you are creating an SMS message. So I decided to try implement
that algorithm in Java.
We know that each number key on the phone is mapped to a set of 3-4 letters.
1 is
(a,b,c),
2 is
(d,e,f) etc. I call them "buckets". Given a sequence of key strokes, a simple algorithm is to generate all combinations of letters from the corresponding buckets and then suggest only those combinations that exist in a dictionary of words.
For example:Sequence
63 can be any of
md,
me,
mf,
nd,
ne,
nf,
od,
oe,
of. The dictionary will have only "
me" and "
of". You can show them in alphabetical order, or in the order of most frequently used, last used etc.
Here are the 2 main functions in this approach. (You can find the full program
here):
getMatchingWords returns a list of all alternatives for a given word. It also requires a dictionary of words and the bucket structure of the keypad.
getBucketWords is the recursive function where all the action happens. Assume the word has 4 letters (so 4 buckets) of which the first 2 you have somehow filled. Now you have to fill the 3rd letter (from all characters in the 3rd bucket). The way to proceed is to pick each letter in the current bucket, append it to the so-far generated string, and call yourself recursively. Here
curStr and
result are used as accumulator variables.
static Set getMatchingWords(String word, Set dict,
List buckets){
Set matches = new HashSet();
List wordBuckets = getWordBuckets(word, buckets);
getBucketWords(wordBuckets, 0, "", matches);
matches.retainAll(dict);
return matches;
}
static Set getBucketWords(List wordBuckets, int bucketIndex,
String curStr, Set result){
if(bucketIndex == wordBuckets.size()){
result.add(curStr);
}else{
String curBucket = (String)wordBuckets.get(bucketIndex);
for(int i=0;<curBucket.length();i++){
String newStr = new String(curStr);
newStr = newStr + curBucket.charAt(i);
getBucketWords(wordBuckets, bucketIndex+1, newStr,
result);
}
}
return result;
}
The biggest failing of this method is all the invalid words that are generated only to be discarded. For a word of length
n, if you have on average
m options for each letter, then the number of words generated is
m to the power n. This is the time complexity of this program.
But consider if the dictionary were already sorted into different buckets according to the key sequences. This is eminently possible because mobile phones can come loaded with such a data structure to facilitate fast lookup.
In that case, for a given key sequence, we only need to look up the data structure for all matching words and return 1 or more. Of course the dictionary pre-processing would have been done offline.
Here's a function which implements this improved algorithm. That full program is
here.
public List getMatches(String word){
List matches = new ArrayList();
String key = getKeyForWord(word);
if(key != null){
if(bucketedWords.containsKey(key)){
matches = (List)bucketedWords.get(key);
}
}
return matches;
}
The dictionary is pre-processed in 2 steps: First, each character in the keypad is mapped to its bucket number. Then, for each word in the dictionary, a bucket key is obtained by concatenating the bucket number of each character in the word. The word is added to the Map entry corresponding to that key.
To do a lookup, the bucket key for a given key sequence is derived as above, and all entries in the Map matching that bucket key are returned.
Each lookup is
O(number of characters in the letter) to generate the key. To generate the bucketed dictionary itself, you need to examine
every character in the dictionary once. If the dictionary has
n words of average length
m, it's
O(m * n).