Purging Anagrams

Let's assume we have already downloaded a free online dictionary (there are loads if you Google it) and that the dictionary is a text file with each word on a separate line. Let's also assume we have already run a program to split the dictionary into multiple files so that one file contains only words of a given length. The file containing three letter words might look something like the following:

ace
act
apt
pat
pot
tap
tag
top

Our anagram game is going to pick a random word from the list each time the user starts a new game. There's just one problem with the list above: some anagrams do not have unique solutions. For example, the word "apt" is an anagram of "pat" and "tap". Before we can use this list for our anagram we need to purge all words in the list which are anagrams of one or more other words in the list. In a small list like the one above this would not take much processing power and would execute in a fraction of a second. For a dictionary containing 10,000 words we would have to compare each word to the other 9,999 words to check whether it was an anagram. This would involve 10,000! (factorial) iterations. That number of iterations is 35,659 digits long so it wouldn't fit on this page. We need a quick way of determining if one word is an anagram of another.

We are going to assign each letter of the alphabet to a unique prime number. Number theory tell us that two words are anagrams if and only if the product of the prime numbers mapping to their letters are equal. Let's illustrate this with an example.

'A' is the first letter of the alphabet so we map it to the first prime number, which is 2. 'P' is the sixteenth letter of the alphabet so we map it to the sixteenth prime number, which is 53. 'T' is the twentieth letter of the alphabet so we map it to the twentieth prime number, which is 71.

If we multiply these three numbers we get 7526 (= 2 x 53 x 71). Number theory tells us that if we multiply these three numbers in any order we will reach the same product. This means that the words "apt", "pat" and "tap" will all evaluate to the number 7526 using our algorithm, meaning that they are anagrams. The words "pot" and "top" on the other hand each evaluate to 176861 meaning that neither of them are anagrams of "apt". The performance of this function is O(n), making it nice and fast.

An applet that implements this functionality and its source code can be found below.

Your browser understands the <APPLET> tag but isn't running the applet, for some reason. Your browser is ignoring the <APPLET> tag!

Java Source Code

AnagramPurgerApplet.java