FreePascal Information Logo Friend of FreePascal Compiler Title
Articles with Feedback, FPC News Library, PDF Collection, Mail Lists, Books, Newsgroups, IRC Open online discussion areas Research and Tutorials Tools, Compilers and Utilities Blurbs about us, advertising, etc.
Welcome to the FoFPC Research Notes: Double-Metaphone

Data Searching using Double-Metaphone

by: G.E. Ozz Nixon Jr.
Published: October 2007
©opyright 2009 by Friends of FPC



     Data Searching is a very complex art in itself. Ignoring the long history of Internet "search engines" growing from Archie in 1992 to today's Google engine. Original techniques to provide users with "full text" search required a solution designed for common mistakes, one of which is called Metaphone.

     Double-Metaphone is a second generation phonetic algorithm, based upon an algorithm published in 1990 for indexing words by pronunciation. The algorithm produces variable length keys as its output, as opposed to Soundex's fixed-length keys. Similar sounding words share the same keys. Double-Metaphone differs from Metaphone by introducing primary and secondary codes, supporting pronunciation in English, Slavic, Germanic, Celtic, Greek, French, Italian, Spanish and Chinese.

Code snippet from DXSock 6.0 WLM (Courtesy of Brain Patchwork DX, LLC)
function Metaphone2(S: string): string; // different design
Const MaxNum=22;
      Tab:Array[1..MaxNum,1..8] Of String=(
{      Org,Ers,Nach,Vor,NotNach,NotVor,Anf,Ende}
      ('Ä','E','','','','','',''),
      ('AE','E','','','','','',''),
      ('AY','AI','','','','','',''),
      ('B','P','','','','','','1'),
      ('CHS','KS','','','','','',''),
      ('CK','K','','','','','',''),
      ('C','K','','','','KH','',''),
      ('D','T','','','','','','1'),
      ('EY','AI','','','','','',''),
      ('EI','AI','','','','','',''),
      ('G','K','','','N','','','1'),
      ('PF','F','','','','','1',''),
      ('PH','F','','','','','1',''),
      ('PN','N','','','','','1',''),
      ('QU','KW','','','','','',''),
      ('Q','K','','','','U','',''),
      ('TZ','TS','','','','','',''),
      ('V','F','','','','','',''),
      ('X','KS','','','','','',''),
      ('Y','J','','','','','1',''),
      ('Y','Ü','','','EA','','0',''),
      ('Z','TS','','','T','','','')
);
Var Anfang,Ende:Boolean;
    Vor,Nach,Work,Output:String;
    L,I,Num:Integer;
begin
   Result:='';
   If (S='') then Exit;
{$IFDEF ASSEMBLY}
   asm // 1090108: By Ozz (FASTER THAN Length()!)
     MOV EAX, S;       // Store Str Address
     MOV EAX, [EAX-$04]; // Move to "Size" Int32
     MOV Num, EAX;    // Put into Result
   End;
{$ELSE}
   Num:=Length(S);
$ENDIF}
   if Num>0 then begin
      S:=Uppercase(S);
      I:=1;
      While I1)Or(Tab[Num,3]=''))AND
               ((QuickPos(Nach,Tab[Num,4])>1)Or(Tab[Num,4]=''))AND
               (QuickPos(Vor,Tab[Num,5])=0)AND
               (QuickPos(Nach,Tab[Num,6])=0)AND
               ((Anfang And (Tab[Num,7]='1'))OR
               (Not(Anfang)And(Tab[Num,7]='0'))OR(Tab[Num,7]=''))AND
               ((Ende And (Tab[Num,8]='1'))OR
               (Not(Ende)And(Tab[Num,8]='0'))OR(Tab[Num,8]='')) Then Begin
               Output:=Output+Tab[Num,2];
               If Length(Tab[Num,1])>1 Then Inc(I,Length(Tab[Num,1])-1);
            End
            Else Output:=Output+Work[1];
         End
         Else OutPut:=OutPut+Work[1];
         Inc(I,1);
      Until I>L;
      Result:=Output;
   end;
end;

     Designed to use Levenshtein, this algorithm returns "Robert" and "Rupert" as "66.66%" the same, while "Rubin" and "Robin" yields "80%". "Ashcroft" and "Ashcraft" yields "87.5%". However, "George" and "Jorge" yields "66.66%", and "Dave" and "David" yields "60%" the same.

     In these simple tests, Metaphone2 with Levenshtein yield the same as using plain Levenshtein against the words. However, introduce words from Europe and this version of the Metaphone helps you match words much better than Levenshtein by itself. Read more on other comparison techniques I used in my search engines.

G.E. Ozz Nixon Jr.
 Links and Products we find useful



ButtonGenerator.com
Valid XHTML 1.0 Transitional Internet Map
Programmer's Heaven
grat-i-fi-ca-tion - noun
the state of being gratified; great satisfaction.


"Your research documents are head on ... I look forward to seeing more notes on your research."

Brian Ellixson
FreePascal User
Locations of visitors to this page world map hits counter
Copyright 2009 by 3F, LLC. All rights reserved. Worldwide.
Your request was processed by server #3 in 0.001545 secs.

sponsor
Click to visit our paid sponsor