Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: making up words

From:Sean M. Burke <sburke@...>
Date:Saturday, March 23, 2002, 23:00
On 2002-03-23, Jonathan Knibb wrote:
>I use Jeffrey Henning's LangMaker software to create random phoneme >sequences, then apply a couple of phonotactic rules to these by hand, and >keep the results in a word list. When I want a new word, I decide (out of >my own head) how many syllables it should have, then go to the appropriate >bit of the wordlist. What I do then is to circle the first three unused
Interesting! I bet one could do well with just modifying this Perl program I just cobbled togther: #!/usr/bin/perl require 5; use strict; #~~~~ Start of configurables ~~~~ my @onsets = qw( p t k r l w b d g h f sh ch s ); my @codas = qw( y r n l nd ng nk ); my @bad_clusters = qw( yy aa ee ii oo uu nng db bd pt tp iy dk tg kh gh ph nm mn nkk nkg mk np nb mg pd bt dt td gk kg ); # Yes, these could be made into phonotactic rules, but feh. # Fails to capture vowel harmony, or the fact that adjacent # closed syllables are pretty rare. #### # # Define functions that produce an onset, a nucleus, and a coda. *onset = def_randomizer_simple(@onsets, '','',''); *nucleus = def_randomizer( a => 10, e => 15, i => 9, u => 8, o => 12 ); # For an equiprobable distribution, it would be just: # *nucleus = def_randomizer_simple(qw(a e i o u)); *coda = def_randomizer_simple(@codas, ('') x (2 * @codas)); # Define function telling us how many syllables in a word *syllable_count = def_randomizer(1 => 5, 2 => 15, 3 => 5, 4 => 2); #~~~~ End of configurables ~~~~ use Text::Wrap; my @words; { Word: foreach my $w_i (1 .. 40) { my $syllable_count = syllable_count(); my $word = ''; Syllable: foreach my $s_i (1 .. $syllable_count) { my $syllable = onset() . nucleus() . coda(); $word .= $syllable; } foreach my $c (@bad_clusters) { redo Word unless index($word, $c) == -1; } push @words, $word; } } print wrap(' ', '', ucfirst(join ' ', @words) . ".\n"); exit; #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # Randomizer metafunctions: sub def_randomizer_simple { # a function that generates functions that pick # from EQUALLY weighted string alternatives. return sub { return undef } unless @_; my @x = @_; return sub { return $x[rand @x] }; } sub def_randomizer { # A function that generates functions that pick # from unequally weighted string alternatives. # So def_randomizer( x => 10, y => 13, z => 7 ) # produces a function that imagines 30 cards (10+13+7), of # which 10 say 'x', 13 say 'y', and 7 say 'z'; and calling # the function randomly draws a card from the deck, returns # what's on the card, and puts the card back. my %p = @_; # the distribution: foo => 10, bar => 2, etc. my($sum, @cases, $val) = 0; foreach my $string (sort {$p{$b} <=> $p{$a}} keys %p) { next unless 0 < ($val = 0 + $p{$string}); $sum += $val; push @cases, [$sum,$string]; } return sub { return undef } unless $sum; return sub { # a closure... $val = rand($sum); foreach my $c (@cases) { return $c->[1] if $val < $c->[0] } return undef; }; } __END__ Sample output: Kekengsi wikeleke li rifehe gawopir chi chifo kore eil lundishe eind shutay endgelu urchiwe kaoinkan uyhege rungroi i hiwa aseng doypal gafar abe fafal rera sesubeti tendil ferul chane koreshey wunfu pautashe ri ohi hangga shonsere tu showoy shinkfeybio tugenk. That program takes a very analytical, bottom-up approach. The good thing is that one can tune the probabilities. The bad part is that it takes conscious effort to do so, and who wants that? (And clearly, the phonotaxis I've put in there is pretty weird, but was just for testing. It's like Swahili or something, except for those strange "nk"s and "nd"s.) I bet I could cobble something together out of Markov-chain models that could be more emergentist. So you'd say "I want it a bit like Swahili, more like Malay, and with a touch of Cantonese, so do it!" and it'd spool out what you'd ordered (after feeding it thru some sort of syllabic sanity-filter), assuming that one had samples of such languages on hand. I've done stuff like that before ( ), just not in a sort of {multi/con}lingual context. Would anyone be interested in using something like this? I'm curious myself what its results would look like. -- Sean M. Burke