Re: Words with built-in error correction
From: | Jim Henry <jimhenry1973@...> |
Date: | Tuesday, December 6, 2005, 0:20 |
On 12/5/05, Gary Shannon <fiziwig@...> wrote:
> Begin with a set of word templates which have only
> consonants and places indicated for vowels, but no
> specific vowels in them:
>
> B- D- F- G- H- J- K- L- M- ....
> When a new word is coined a template is selected and
> whatever vowels you like are inserted.
> T-LK-M-T- can become talkamata, or tolkimato, or
> tulkometi, or any one of 625 different words with that
> consonant pattern.
>
> Once the word is selected and added to the lexicon the
> template is discarded, never to be used again. Thus
> every core word in the lexicon has a unique consonant
> pattern, and no matter how the pronunciation of the
A while ago I thought of a system for
devising morphemes such that no two morphemes
in the language would differ by only one
phoneme. In other words, there would be no
minimal pairs. I devised a script to
generate word lists given a certain input
of phoneme inventory, but
never came up with a completely satisfying
phoneme inventory to use with it.
To make it maximally redundant, I
tried to have every phoneme differ
from every other by at least two distinctive
features, which led to some interesting
but not very euphonious phonologies.
Such a system required an equal number of consonants
and vowels to work best. For instance with
8 vowels and 8 consonants you can devise
64 CVC morphemes all of which differ
from the other 63 by at least two phonemes.
I don't recall exactly how many more you
get when you go to CVCV or CVCVC
morphemes.
You can pick one series of 8 CVC morphemes
and truncate the final vowel to get a group
of CV morphemes that are just as redundant
with the other 56.
Here is the Perl script I wrote for this.
# Generate redundant morphemes given a sequence of phoneme sets.
# Resultant morpheme set will have no two morphemes which differ by
# fewer than two phonemes.
# first version will be specialized for sequence of three phonemes
# later generalize it for four or more
my $debug = 0;
my $inputfile = shift;
if ( ! $inputfile ) {
die ( "Argument: input file with phoneme lists\n" );
}
open (PHONEMES, $inputfile) || die ("Couldn't find $inputfile\n");
my @phoneme_sets;
my $phoneme_set_idx = 0;
my $phoneme_idx = 0;
my $line;
my $phoneme_set_count = -1;
while (defined ( $line = <PHONEMES>) ) {
chop ($line);
if ( $line =~ /SLOT *([0-9]+)/ ) {
$phoneme_set_count++;
$phoneme_set_idx = 0;
next;
}
if ($line !~ /^ *$/) {
print "assigning $line to row $phoneme_set_count column
$phoneme_set_idx\n" if $debug;
$phoneme_sets[ $phoneme_set_count ][ $phoneme_set_idx ] = $line;
$phoneme_set_idx++;
}
}
if ( $phoneme_set_count != 2 ) {
die ( "this version only supports 3-phoneme sequences" );
}
# now generate real morphemes.
# note this only works if we have 3 dimensions; needs extensive work
to do 4 or 5
my $i = 0, $j = 0, $k = 0;
my @dimension_size;
for ( $i = 0; $i < scalar(@phoneme_sets); $i++ ) {
my $slotref = $phoneme_sets[ $i ];
$dimension_size[ $i ] = scalar( @{$slotref} );
print "\$dimension_size\[ $i \] = " . $dimension_size[ $i ] .
"\n" if $debug;
}
$i = 0, $j = 0, $k = 0, $iter = 0;
my @morpheme_prism;
my $finished = 0;
while ( $finished == 0 ) {
print "iteration " . ++$iter . " -- values ( $i, $j, $k ) \n" if $debug;
&print_whole_prism if $debug == 2;
print "==============\n" if $debug == 2;
print "try ($i, $j, $k) \n" if $debug;
if ( $morpheme_prism[ $i ][ $j ][ $k ] != 1 && $morpheme_prism[ $i
][ $j ][ $k ] != 2 ) {
&mark_used( \@morpheme_prism, $i, $j, $k );
$i = ($i + 1) % $dimension_size[ 0 ];
$j = ($j + 1) % $dimension_size[ 1 ];
next;
}
$i = ($i + 1) % $dimension_size[ 0 ];
$k++;
# double check this for off-by-one err...
if ( $k >= $dimension_size[ 2 ] ) {
$finished = 1;
break;
}
}
&print_whole_prism if $debug;
sub print_whole_prism {
my $m, $n, $o;
for ( $o = 0; $o < $dimension_size [ 2 ]; $o++ ) {
for ( $n = 0; $n < $dimension_size [ 1 ]; $n++ ) {
for ( $m = 0; $m < $dimension_size [ 0 ]; $m++ ) {
if ( $morpheme_prism[ $m ][ $n ][ $o ] == 1 ) {
print "xxx ";
} elsif ( $morpheme_prism[ $m ][ $n ][ $o ] == 2 ) {
print $phoneme_sets[ 0 ][ $m ] . $phoneme_sets[ 1 ][ $n ] .
$phoneme_sets[ 2 ][ $o ] . " ";
} else {
print "000 ";
}
}
print "\n";
}
print "\n";
}
}
sub mark_used {
my ( $arr_ref, $p, $q, $r ) = @_;
my @arr = @{ $arr_ref };
my $z;
print "previous value: [" . $arr[ $z ][ $q ][ $r ] . "] " if $debug;
print "marking " if $debug;
for ( $z=0; $z< $dimension_size[ 0 ]; $z++ ) {
$arr[ $z ][ $q ][ $r ] = 1;
print "($z, $q, $r) " if $debug;
}
print "\n" if $debug;
for ( $z=0; $z< $dimension_size[ 1 ]; $z++ ) {
$arr[ $p ][ $z ][ $r ] = 1;
print "($p, $z, $r) " if $debug;
}
print "\n" if $debug;
for ( $z=0; $z< $dimension_size[ 2 ]; $z++ ) {
$arr[ $p ][ $q ][ $z ] = 1;
print "($p, $q, $z) " if $debug;
}
print "\n" if $debug;
$arr[ $p ][ $q ][ $r ] = 2;
print "** $p, $q, $r ** " if $debug;
print $phoneme_sets[ 0 ][ $p ] . $phoneme_sets[ 1 ][ $q ] .
$phoneme_sets[ 2 ][ $r ] . "\n";
}
Replies