A Better Phonetic Lookup
This Java applet uses the “Metaphone” phonetic code algorithm described by
Lawrence Philips in the December 1990 issue of Computer Language. This
algorithm produces better matches than the Soundex algorithm. An input
word is reduced to a 1 to 4 character code using relatively simple
phonetic rules for typical spoken English.
Type a word into the test word field and press return or click on the
Calculate button to see the resulting phonetic code.
In order to test phonetic lookup based on this code, choose one of the Word Sources – this will cause a file having a number of words to be read from the server. The words will be placed in a lookup class calculating the phonetic code on the fly as a key. When a word source is resident, any words with the same phonetic code as words typed in the test field will be displayed in the Matches text area.
The women’s names, men’s names, and place name files are from Gary Ward’s “Moby Words” collection which he has placed in the public domain.
The Metaphone Rules
Metaphone reduces the alphabet to 16 consonant sounds:
B X S K J T F H L M N P R 0 W Y
That isn’t an O but a zero – representing the ‘th’ sound.
Transformations
Metaphone uses the following transformation rules:
Doubled letters except “c” -> drop 2nd letter.
Vowels are only kept when they are the first letter.
B -> B unless at the end of a word after "m" as in "dumb"
C -> X (sh) if -cia- or -ch-
S if -ci-, -ce- or -cy-
K otherwise, including -sch-
D -> J if in -dge-, -dgy- or -dgi-
T otherwise
F -> F
G -> silent if in -gh- and not at end or before a vowel
in -gn- or -gned- (also see dge etc. above)
J if before i or e or y if not double gg
K otherwise
H -> silent if after vowel and no vowel follows
H otherwise
J -> J
K -> silent if after "c"
K otherwise
L -> L
M -> M
N -> N
P -> F if before "h"
P otherwise
Q -> K
R -> R
S -> X (sh) if before "h" or in -sio- or -sia-
S otherwise
T -> X (sh) if -tia- or -tio-
0 (th) if before "h"
silent if in -tch-
T otherwise
V -> F
W -> silent if not followed by a vowel
W if followed by a vowel
X -> KS
Y -> silent if not followed by a vowel
Y if followed by a vowel
Z -> S
Initial Letter Exceptions
Initial kn-, gn- pn, ae- or wr- -> drop first letter Initial x- -> change to "s" Initial wh- -> change to "w"
The code is truncated at 4 characters in this example, but more could be used.
Lawrence Philips, “Hanging on the Metaphone”, Computer Language v7 n12, December 1990, pp39-43.
Java PhoneticList Class
I have implemented the Metaphone code as part of a class called PhoneticList. As the name indicates, this class tracks lists of objects by the Metaphone code derived from a key string. Operation is similar to a Hashtable except that any number of Objects can have the same code and an Object array is returned by the lookup function. In the example applet, the Objects are Strings but they could be anything.
A working version is available at Bill Brogden’s website here
