Re: Iterative conlang design with corpus analysis, Or, Build one to throw away
From: | John Vertical <johnvertical@...> |
Date: | Thursday, April 20, 2006, 9:05 |
> > Do you (or anyone else) have figures for "permutation density" in
> > natlangs? Wondering what the "happy median" is.
> >
> > --
> > AA
> >
http://conlang.arthaey.com
> >
>
>It's hard to do for natlangs since they don't generally have
>proscribed word forms. But it could be estimated. For English, for
>example, we could pick CV words and let V stand for any phonemic vowel
>sound in English and C represent any consonant including <w> and <y>.
>
>I might do this when I'm more awake.
>
>---larry
FWIW
I once did some mapping of English monosyllabes (by hand...) I don't have
the results any more, but there were definite tendencies - some of which
fairly trivial, however. I didn't catch anything really surprizing.
L = liquid, F = fricative, P = plosive, N = nasal
- Onset clusters were consistently about 3-4 times as rare as single (or
null) initials.
- C + w initial clusters were markedly rarer than s + C and C + L - which
seemed to be fairly equally common.
- C + L clusters favored unvoiced C.
- Among s + C clusters, the internal frequency appeared to be sP > {sN sl} >
shr > sw > sf.
- Three-consonant onsets seemed to prefer lighter codas; their frequency was
higher than what could be expected from their parts' frequencies. That is to
say, if <tr> and <st> were both 20% in use, <str> could be expected to be at
about 4% ... but might be found closer to 10% instead.
- Historical vowel mergers and splits were clearly visible: homophony
centered around /i ei ou/, while the /A U u/ columns were usually almost
empty, at the expense of /& V ju/. (I ignored the trap/bath split and most
yod-dropping.) /Oi O/ were also rare.
- A preference of "short" vowels was also apparent, but not by much.
- The labials : coronals : velars ratio (counting clusters by their least
sonorous member) seemed to be at least 3 : 5 : 2, possibly more separated at
some parts.
- Initial fricatives were surprizingly rare compared to stops and sonorants.
/f s S/ seemed to be about a third rarer, and /v z/ almost nonexistent.
- CV words were about 80-95% used, depending on how you count the vowel
phonemes.
- CVL words were a little less filled, but fairly full too. Maybe about
70-80%. Coda /rl/ was rare.
- CVN(P) words were only about 20-30% used. I didn't notice much density
difference between those with a single nasal coda and those with nasal +
plosive.
- CVP words were a bit more common, clearly around 30%
- CVF words I didn't finish mapping, but from the initial results, the
density seemed to be around 20% too, maybe less.
All numbers should be taken as extremely unaccurate.
Of course, since this was all done by someone speaking English as L2, for
all I know I missed hundreds of archaic and rare words meaning "the nearest
leg of a horse" or "a vessel for fermentation of mustard" etc. It shouldn't
affect the relative densities, however.
I'd prefer calling it "lexicalization" rather than "permutation" density,
however.
John Vertical
Reply