Re: Lexicon management
From: | Benct Philip Jonsson <bpj@...> |
Date: | Tuesday, August 17, 2004, 12:16 |
I wrote:
> Jim Henry wrote:
>
>> The original poster asked about automatically converting
>> a conlang-to-natlang dictionary to a natlang-to-conlang dictionary.
>> I don't think it's possible. But given a tractable format, you might
>> could use
>> an appropriate script to generate a rough draft which you would then have
>> to extensively edit.
>
>
> I once made a perl script for just that. Will see if I can find it!
I found it! (I removed some things peculiar to the original
application.)
#! /usr/bin/perl -w
# Make a reverse word list: heads and glosses are separated by tabs,
# alternative glosses are separated by comma+space. Like this:
#
# apa ape, monkey
# skog wood, forest
# trä wood
#
# which should be turned into:
#
# ape apa
# forest skog
# monkey apa
# wood skog, trä
#
# (c) BP Jonsson, melroch@melroch.se, August 2004
$file = shift(@ARGV);
$outfile = 'reverse-' . $file;
open(IN, $file);
open(OUT, ">$outfile");
# Initialize the hash to hold the reverse list
%revlist = ();
while(<IN>){
chomp;
# split words from glosses at the tab
($word,$glosses) = split(/\t/,$_);
# zap gloss material in parentheses
$glosses =~ s/ ?([()]) ?/$1/g;
$glosses =~ s/\(.*\)//g;
$glosses =~ s/^of //g;
$glosses =~ s/^a //g;
$glosses =~ s/^an //g;
$glosses =~ s/^the //g;
$glosses =~ s/^to //g;
$glosses =~ s/,of /, /g;
$glosses =~ s/,a /, /g;
$glosses =~ s/,an /, /g;
$glosses =~ s/,the /, /g;
$glosses =~ s/,to /, /g;
# make sure each comma is followed by exactly one space
$glosses =~ s/, */,/g;
$glosses =~ s/,/, /g;
# split the glosses into a list
@glosses = split(/, /,$glosses);
# go thru the glosses
foreach $g (@glosses){
if($g ne ''){ # make sure the gloss isn't empty!
unless(exists($revlist{$g})){$revlist{$g} =$word}
else{
# remove spaces
# $g =~ s/^ //g; $g =~ s/ $//g;
# add the new word to the value of the entry
$val = $revlist{$g};
$val = "$val, $word";
$revlist{$g} = $val;
}
}
}
}
foreach $key (sort(keys(%revlist))){
print OUT "$key: $revlist{$key}\n"
}
__END__
--
/BP 8^)
--
B.Philip Jonsson -- melroch at melroch dot se
Solitudinem faciunt pacem appellant!
(Tacitus)
/BP 8^)
--
B.Philip Jonsson -- melroch at melroch dot se
Solitudinem faciunt pacem appellant!
(Tacitus)