Conlang: Re: Self-segregating morphology again - in simpler terms, with list of methods (And Rosta, Apr 21 '06, 0:26)

Re: Self-segregating morphology again - in simpler terms, with list of methods

From:	And Rosta <and.rosta@...>
Date:	Friday, April 21, 2006, 0:26

From:

And Rosta <and.rosta@...>

Date:

Friday, April 21, 2006, 0:26

Jim Henry, On 17/04/2006 19:53:

> On 4/17/06, And Rosta <and.rosta@...> wrote: > >> My conlang, Livagian, has unambiguous syntax parsed >> incrementally with no lookahead, and it cuts the >> Gordian knot of self-segrating morphology by extending >> the input to the syntactic parser to the level of the >> syllable (or potentially the segment). As each syllable >> is read in, the syllable is looked up in the lexicogrammar, > ........ >> The lexicon necessarily contains instructions for how to >> deal with every string of syllables. >> >> The upshot is that a sentence can't necessarily be parsed >> into words or morphemes on the basis of its phonological >> form alone, but a sentence can be fully parsed on the >> basis of its phonological form and the lexicogrammar, >> without there being a need for self-segregating morphology >> or for the complexities or constraints on morpheme shapes >> that self-segregation schemes impose. > > Let me make sure I understand. You don't constrain > the shapes of the individual morphemes so that they're > inherently self-segregating -- but it seems you must > constrain them relative to each other so that no morpheme > looks like a prefix or suffix of another morpheme. > So e.g., if there were a monosyllabic word "vek", > there couldn't be any two or more syllable word that > starts with "vek", but there's no reason /v/ or /e/ shouldn't > occur in the middle or end of any word.

This is quite correct. But the rules can be sensitive to grammatical environment. So in one part of a sentence /vek/ might parse as a whole word, and in other part it might parse as the initial portion of a word, and in yet another part it might parse as /ve/+/k/. To give a concrete example, in Livagian any word that can begin a number expression kicks into local operation a completely new lexicon (one that contains only entries corresponding to mathematical entities); in this lexicon, you get single-segment words. In principle, there are no phonological generalizations to be made about word/morpheme boundary placement. (In practise, Livagian does in fact use phonological criteria (tonal) to guide the parser in boundary insertion, but this is simply for the sake of convenience/simplicity/regularity, and is not a necessary element of the scheme.)

> I'm thinking that I might probably impose a constraint > like this on my next conlang over and above a > self-segregation rule -- or perhaps instead of such. > Self-segregating morphology is probably of benefit > primarily to beginning learners of a language, whose > vocabulary is still small.

IMO, the only respect in which self-segregation is useful is that it is a necessary condition for unambiguity. I don't believe it could have a positive impact on ease of use.

> But your rule would be > helpful to more experienced speakers when > looking up the occasional unknown word. They > would not encounter words that look like they might > be a compound of one word they already know > and another word they don't know, which turn out > to be irreducible roots instead (this occasionally > happens to me in Esperanto).

The basic scheme I outlined -- in which nothing can be predicted from phonological shape and boundary placement is completely reliant on the lexicon -- can't deal with unknown words. (For proper names, you could have a proper name marker that kicks into local operation a set of rules guided by phonological shape.) It might therefore be prudent to have a default fallback rule, such as "If the current word has reached three syllables and there is no instruction in the lexical entry for this three-syllable string to parse in a further syllable, then assume that the three syllables constitute a complete but unknown word". Of course, you'd also need default rules about how to handle unknown words syntactically. --And.