Re: making up words
From: | Sean M. Burke <sburke@...> |
Date: | Saturday, March 23, 2002, 23:00 |
On 2002-03-23, Jonathan Knibb wrote:
>I use Jeffrey Henning's LangMaker software to create random phoneme
>sequences, then apply a couple of phonotactic rules to these by hand, and
>keep the results in a word list. When I want a new word, I decide (out of
>my own head) how many syllables it should have, then go to the appropriate
>bit of the wordlist. What I do then is to circle the first three unused
Interesting!
I bet one could do well with just modifying this Perl program I just
cobbled togther:
#!/usr/bin/perl
require 5;
use strict;
#~~~~ Start of configurables ~~~~
my @onsets = qw(
p t k r l w
b d g h f sh ch s
);
my @codas = qw(
y
r n l nd ng nk
);
my @bad_clusters = qw(
yy aa ee ii oo uu nng db bd pt tp iy dk tg kh gh ph
nm mn nkk nkg mk np nb mg
pd bt dt td gk kg
);
# Yes, these could be made into phonotactic rules, but feh.
# Fails to capture vowel harmony, or the fact that adjacent
# closed syllables are pretty rare.
####
#
# Define functions that produce an onset, a nucleus, and a coda.
*onset = def_randomizer_simple(@onsets, '','','');
*nucleus = def_randomizer(
a => 10, e => 15, i => 9, u => 8, o => 12
);
# For an equiprobable distribution, it would be just:
# *nucleus = def_randomizer_simple(qw(a e i o u));
*coda = def_randomizer_simple(@codas, ('') x (2 * @codas));
# Define function telling us how many syllables in a word
*syllable_count = def_randomizer(1 => 5, 2 => 15, 3 => 5, 4 => 2);
#~~~~ End of configurables ~~~~
use Text::Wrap;
my @words;
{
Word:
foreach my $w_i (1 .. 40) {
my $syllable_count = syllable_count();
my $word = '';
Syllable:
foreach my $s_i (1 .. $syllable_count) {
my $syllable = onset() . nucleus() . coda();
$word .= $syllable;
}
foreach my $c (@bad_clusters) {
redo Word unless index($word, $c) == -1;
}
push @words, $word;
}
}
print wrap(' ', '', ucfirst(join ' ', @words) . ".\n");
exit;
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Randomizer metafunctions:
sub def_randomizer_simple {
# a function that generates functions that pick
# from EQUALLY weighted string alternatives.
return sub { return undef } unless @_;
my @x = @_;
return sub { return $x[rand @x] };
}
sub def_randomizer {
# A function that generates functions that pick
# from unequally weighted string alternatives.
# So def_randomizer( x => 10, y => 13, z => 7 )
# produces a function that imagines 30 cards (10+13+7), of
# which 10 say 'x', 13 say 'y', and 7 say 'z'; and calling
# the function randomly draws a card from the deck, returns
# what's on the card, and puts the card back.
my %p = @_; # the distribution: foo => 10, bar => 2, etc.
my($sum, @cases, $val) = 0;
foreach my $string (sort {$p{$b} <=> $p{$a}} keys %p) {
next unless 0 < ($val = 0 + $p{$string});
$sum += $val;
push @cases, [$sum,$string];
}
return sub { return undef } unless $sum;
return sub { # a closure...
$val = rand($sum);
foreach my $c (@cases) { return $c->[1] if $val < $c->[0] }
return undef;
};
}
__END__
Sample output:
Kekengsi wikeleke li rifehe gawopir chi chifo kore eil lundishe eind
shutay endgelu urchiwe kaoinkan uyhege rungroi i hiwa aseng doypal gafar
abe fafal rera sesubeti tendil ferul chane koreshey wunfu pautashe ri ohi
hangga shonsere tu showoy shinkfeybio tugenk.
That program takes a very analytical, bottom-up approach. The good thing
is that one can tune the probabilities. The bad part is that it takes
conscious effort to do so, and who wants that?
(And clearly, the phonotaxis I've put in there is pretty weird, but was
just for testing. It's like Swahili or something, except for those
strange "nk"s and "nd"s.)
I bet I could cobble something together out of Markov-chain models that
could be more emergentist. So you'd say "I want it a bit like Swahili,
more like Malay, and with a touch of Cantonese, so do it!" and it'd
spool out what you'd ordered (after feeding it thru some sort of
syllabic sanity-filter), assuming that one had samples of such languages
on hand. I've done stuff like that before (
http://search.cpan.org/search?dist=Games-Dissociate ), just not in a
sort of {multi/con}lingual context. Would anyone be interested in using
something like this? I'm curious myself what its results would look
like.
--
Sean M. Burke sburke@cpan.org http://www.spinn.net/~sburke/