Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: Lexicon management

From:Benct Philip Jonsson <bpj@...>
Date:Tuesday, August 17, 2004, 12:16
I wrote:
> Jim Henry wrote: > >> The original poster asked about automatically converting >> a conlang-to-natlang dictionary to a natlang-to-conlang dictionary. >> I don't think it's possible. But given a tractable format, you might >> could use >> an appropriate script to generate a rough draft which you would then have >> to extensively edit. > > > I once made a perl script for just that. Will see if I can find it!
I found it! (I removed some things peculiar to the original application.) #! /usr/bin/perl -w # Make a reverse word list: heads and glosses are separated by tabs, # alternative glosses are separated by comma+space. Like this: # # apa ape, monkey # skog wood, forest # trä wood # # which should be turned into: # # ape apa # forest skog # monkey apa # wood skog, trä # # (c) BP Jonsson, melroch@melroch.se, August 2004 $file = shift(@ARGV); $outfile = 'reverse-' . $file; open(IN, $file); open(OUT, ">$outfile"); # Initialize the hash to hold the reverse list %revlist = (); while(<IN>){ chomp; # split words from glosses at the tab ($word,$glosses) = split(/\t/,$_); # zap gloss material in parentheses $glosses =~ s/ ?([()]) ?/$1/g; $glosses =~ s/\(.*\)//g; $glosses =~ s/^of //g; $glosses =~ s/^a //g; $glosses =~ s/^an //g; $glosses =~ s/^the //g; $glosses =~ s/^to //g; $glosses =~ s/,of /, /g; $glosses =~ s/,a /, /g; $glosses =~ s/,an /, /g; $glosses =~ s/,the /, /g; $glosses =~ s/,to /, /g; # make sure each comma is followed by exactly one space $glosses =~ s/, */,/g; $glosses =~ s/,/, /g; # split the glosses into a list @glosses = split(/, /,$glosses); # go thru the glosses foreach $g (@glosses){ if($g ne ''){ # make sure the gloss isn't empty! unless(exists($revlist{$g})){$revlist{$g} =$word} else{ # remove spaces # $g =~ s/^ //g; $g =~ s/ $//g; # add the new word to the value of the entry $val = $revlist{$g}; $val = "$val, $word"; $revlist{$g} = $val; } } } } foreach $key (sort(keys(%revlist))){ print OUT "$key: $revlist{$key}\n" } __END__ -- /BP 8^) -- B.Philip Jonsson -- melroch at melroch dot se Solitudinem faciunt pacem appellant! (Tacitus) /BP 8^) -- B.Philip Jonsson -- melroch at melroch dot se Solitudinem faciunt pacem appellant! (Tacitus)