Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: Iterative conlang design with corpus analysis, Or, Build one to throw away

From:John Vertical <johnvertical@...>
Date:Thursday, April 20, 2006, 9:05
> > Do you (or anyone else) have figures for "permutation density" in > > natlangs? Wondering what the "happy median" is. > > > > -- > > AA > > http://conlang.arthaey.com > > > >It's hard to do for natlangs since they don't generally have >proscribed word forms. But it could be estimated. For English, for >example, we could pick CV words and let V stand for any phonemic vowel >sound in English and C represent any consonant including <w> and <y>. > >I might do this when I'm more awake. > >---larry
FWIW I once did some mapping of English monosyllabes (by hand...) I don't have the results any more, but there were definite tendencies - some of which fairly trivial, however. I didn't catch anything really surprizing. L = liquid, F = fricative, P = plosive, N = nasal - Onset clusters were consistently about 3-4 times as rare as single (or null) initials. - C + w initial clusters were markedly rarer than s + C and C + L - which seemed to be fairly equally common. - C + L clusters favored unvoiced C. - Among s + C clusters, the internal frequency appeared to be sP > {sN sl} > shr > sw > sf. - Three-consonant onsets seemed to prefer lighter codas; their frequency was higher than what could be expected from their parts' frequencies. That is to say, if <tr> and <st> were both 20% in use, <str> could be expected to be at about 4% ... but might be found closer to 10% instead. - Historical vowel mergers and splits were clearly visible: homophony centered around /i ei ou/, while the /A U u/ columns were usually almost empty, at the expense of /& V ju/. (I ignored the trap/bath split and most yod-dropping.) /Oi O/ were also rare. - A preference of "short" vowels was also apparent, but not by much. - The labials : coronals : velars ratio (counting clusters by their least sonorous member) seemed to be at least 3 : 5 : 2, possibly more separated at some parts. - Initial fricatives were surprizingly rare compared to stops and sonorants. /f s S/ seemed to be about a third rarer, and /v z/ almost nonexistent. - CV words were about 80-95% used, depending on how you count the vowel phonemes. - CVL words were a little less filled, but fairly full too. Maybe about 70-80%. Coda /rl/ was rare. - CVN(P) words were only about 20-30% used. I didn't notice much density difference between those with a single nasal coda and those with nasal + plosive. - CVP words were a bit more common, clearly around 30% - CVF words I didn't finish mapping, but from the initial results, the density seemed to be around 20% too, maybe less. All numbers should be taken as extremely unaccurate. Of course, since this was all done by someone speaking English as L2, for all I know I missed hundreds of archaic and rare words meaning "the nearest leg of a horse" or "a vessel for fermentation of mustard" etc. It shouldn't affect the relative densities, however. I'd prefer calling it "lexicalization" rather than "permutation" density, however. John Vertical

Reply

Larry Sulky <larrysulky@...>