Conlang: Re: Self-segregating Semitic Morphology (Logan Kearsley, Sep 8 '08, 16:13)

Re: Self-segregating Semitic Morphology

From:	Logan Kearsley <chronosurfer@...>
Date:	Monday, September 8, 2008, 16:13

From:

Logan Kearsley <chronosurfer@...>

Date:

Monday, September 8, 2008, 16:13

Having slept for a while, I think I've got some good answers to my own question. Sleeping is good for solving a lot of problems. It's still good to have alternate suggestions, though; the stuff my subconscious came up with definitely doesn't constitute the only option. On Mon, Sep 8, 2008 at 9:01 AM, Jim Henry <jimhenry1973@...> wrote:

> On Mon, Sep 8, 2008 at 12:53 AM, Logan Kearsley <chronosurfer@...> wrote: >> Thought 1- building vocabulary based on consonantal roots allows for a >> large and powerful derivational system without having to resort to >> long strings of agglutinating affixes. >> Thought 2- self-segregating morphology is kinda cool. >> >> It would be neat if these two ideas could be combined. Unfortunately, > > I see a couple of obvious ways to do it. > > 1. Use one set of consonants for your triliteral roots, and another > set of consonants that can occur in suffixes. There are no prefixes. > Any time you encounter a consonant from the set of root consonants > following one or more suffix consonants and vowels, you've found the > beginning of a word, and any time you find a consonant from the set > of suffix consonants, you've found the beginning of a suffix. If you find > a bunch of root consonants in a row separated only by vowels, > then the first, fourth, seventh etc. indicate the start of a new word. > > 2. Or allow prefixes too, and have a third set of consonants used > only in prefixes.

If the roots in the class that can have vowel-pattern derivations applied are all the same length (which is a basic assumption for how this sort of thing works), I don't think you need three sets of consonants. Say that Set 1 is for prefixes, and Set 2 is for roots and suffixes. Then, a transition from S1 to S2 marks the beginning of a root, which continues for a known number of consonants, and everything after that until the next word must be suffixes. A transition from S2 to S1 marks the boundaries between words, but that could fail if there are no prefixes on the next word; that could be solved by mixing the consonant types in roots- requiring that the first one be from S1 and the later ones from S2.

> That limits your options on root vowel patterns; you couldn't have > vowels before the first root consonant, unless they were part > of a prefix. You could have e.g. > > CaCiCu > CCuCa > CiaCCi > CaCauC > > etc., with suffixes of form CV+, but not root patterns like > > aCCiC > uCiCaC > > etc.

Mm... I'm not seeing why. It definitely *does* limit the number of consonantal roots, which could be annoying depending on how many consonants you start out to work with; but that's a different concern from limiting the derivations patterns available.

> Suppose you have 20 consonants and 5 vowels, with > 15 consonants allowed in roots and the other 5 in suffixes; > that gives you 3375 possible roots and 150 CV and CVV > suffixes. Not sure offhand how to calculate the possible > root vowel patterns, but there should be scads of them too.

Number of patterns = (n+1)^(m+1)-1, where n is the number of vowels available for use in derivation patterns, m is the number of consonants in a root, all consonant clusters are allowed, and we assume that there are no strings of multiple vowels and that there must be at least one vowel somewhere in a word. There are scads even for fairly small numbers of vowels. However, modifications are in order to account for additional restrictions (like, root words can't start with vowels, or root words must start with vowels, or 3-consonant clusters aren't allowed, etc.), and each one of those tends to drastically reduce exactly how many scads you get. Relaxing the assumption that there are no strings of vowels actually doesn't matter much, because it's equivalent to just increasing the vowel inventory. Assume a 4 vowel system with triliteral roots, for illustrative purpose. (n+1)^(m+1)-1 = 5^4-1 = 624 derivational patterns, more than Arabic, I think. If you require that root words don't start with vowels, then it becomes: (n+1)^m-1 = 5^3-1 = 124 derivational patterns, a lot fewer than Arabic. If 3-consonant clusters aren't allowed: n*(n+2)*(n+1)^2 = 6*5^2 = 600, which is pretty good (I think; I'm not absolutely sure that I derived that last expression correctly). On Mon, Sep 8, 2008 at 9:53 AM, R A Brown <ray@...> wrote: [...]

> ...and a third method might be along the lines John Cowan outlined for > xuxuxi: > > {quote} > xuxuxi uses vowel harmony/disharmony to resolve the problem. > All multi-syllable words are stressed on the first syllable, > and then the other syllables of the word, except the last, > have vowel harmony. The last syllable of the word has disharmony. > Any remaining syllables before the next stressed syllable are > monosyllabic.

That's the sort of thing that I would count as drastically reducing the number of patterns available, because it restricts the vowel inventory available for use in any particular word.

> Here's the harmony/disharmony table: > > first medial last > a a, e, o i, u > e a, e, i o, u > i a, e, i o, u > o a, o, u i, e > u a, o, u i, e

Time for more math to see if this is really as restricting as I think it is. You've got 5 initial vowels, 3 internal vowels, and 2 final vowels for any case. If there's only 1 vowel it must be initial, if there're 2 vowels they must be an initial and a final, if there are three or four vowels, you get all three choices. Vowels can appear in four positions around a triliteral root. VCCC CVCC CCVC CCCV 4*5 + VCVCC VCCVC VCCCV CVCVC CVCCV CCVCV 6*5*2 + VCVCVC VCCVCV VCVCCV CVCVCV 4*5*3*2 + VCVCVCV 5*3*3*2 = 290 Barely more than what you get with 3 unrestricted vowels, and a small fraction of what you get with 4, let alone 5. On Mon, Sep 8, 2008 at 9:57 AM, Lars Mathiesen <thorinn@...> wrote: [...]

> Looking up self-segregating morphology on Conlang Wikia, it looks like > the accepted definition is that morpheme or word boundaries should be > immediately obvious without full knowledge of the lexicon. > > In a semitic style language, the morphemes of a word aren't combined > in sequence, so they don't have boundaries as such. You may be > thinking that you need to make self-segregating _syllables_, but I > don't think that serves any purpose in this context. You will probably > have to come up with another definition of self-segregating to be able > to play.

I was mainly thinking of self-segregating words. Yes, obvious internal segregation can't easily be applied to derivation patterns that may contain multiple morphemes, so I wasn't worrying about that. It would be nice, in addition to segregating words, if you could segregate affix morphemes from the roots as well, but that's a secondary consideration for me.

> And if you want self-segregating phonological words, you're actually > better off than with sequential morphemes. Self-segregation (or > self-synchronization in general coding theory) needs redundancy, which > is the same as 'populating the pattern space sparsely'. And since your > words are known to be built on tri-consonantal roots, marking your > word boundaries will only need about the same redundancy per word that > syllable-based schemes do per syllable, so your pattern space can be > more densely populated.

That's a good point. In the limit where there are no affixes and *all* words in the language have the same root structure, segregation is trivially easy; every set of three consonants is a new word, and you just need some way of occasionally disambiguating whether a stray vowel goes with the last word or the next (although that system is fragile; it depends on knowing exactly where the speech-stream starts; if you come in in the middle, you'll be out of synch, and we'd like to have some way of fixing that). However, that's a really ridiculous limit. We'd probably like to have the occasional grammatical particle or anaphor or something that has it's own form distinct from the root derivation system. One option became Exceedingly Obvious just after I woke up this morning- mark word boundaries with successive vowels. If you've already got the assumption that derivation patterns only use solitary vowels, then it's natural to think that two vowels in a row must belong to separate roots. This requires the restriction that every word must begin and end with a vowel, which means that the only single-syllable words will be single vowels, and the number of possible derivations is restricted to: n^2*(n+1)^(m-1), assuming that all possible consonant clusters are allowed so you don't need any internal vowels. With 4 vowels and triliteral roots, that results in 400 derivation patterns. Not bad. If we require at least 1 internal vowel, then we get: n^3*(n+2), using triliteral roots, which comes out to 384. Still not bad. Using 5 vowels gets us up to 875 (out of a total possible of 1295... which might be overkill), which is quite good. I'm not sure how I like the aesthetic of every word beginning and ending with a vowel, but it does work nicely. And it allows for the use of shorter roots mixed in to the language as well. This restricts the form of prefixes to VC{C}, and suffixes to {C}CV (although, if the clusters are allowed, you could have single-consonant infixes which hijack the already-present initial and terminal vowels as well). And it very nearly requires that you only use one or the other, but that is fixable by designating a consonant (or class of consonants) to mark the boundaries of an affix list (as discussed above); pick the clusters right, and that doesn't even require adding an extra syllable. -l.

Replies

Logan Kearsley <chronosurfer@...>
Larry Sulky <larrysulky@...>
Jim Henry <jimhenry1973@...>