Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: Most developed conlang

From:Alex Fink <a4pq1injbok_0@...>
Date:Monday, April 23, 2007, 6:09
On Sun, 22 Apr 2007 18:47:58 -0400, Jim Henry <jimhenry1973@...> wrote:

>On 4/22/07, Henrik Theiling <theiling@...> wrote: > >> So your main criterion would be predictability of semantics? If >> predictable => no new word, if not predictable => new word. This >> seems, well, very reasonable for composing a lexicon. Of course there >> will be difficult cases, but let's ignore them for now. >> >> This means that for counting a conlang's words, we probably should: >> >> - also count phrases ('bubble sort algorithm') and idioms >> >> - not count lexicon entries that are due to irregular forms >> ('saw' cf. 'see') >> - count polysynthetically constructed words several times, >> excluding structures that are semantically clear operations, >> but counting all irregularly derived concepts >
What you're proposing to count there seem to be essentially _listemes_ [Wiktionary def: (linguistics) An item that is memorized as part of a list, as opposed to being generated by a rule.], except that suppletive and 'irregular' forms do count as listemes. But, afaik, it's debatable whether strong verbs are really irregular in the relevant sense. That's a perfectly reasonable criterion for counting, as I see it. In particular it correlates pretty closely to the amount of work the conlanger will have had to put in to designing the lexicon: each listeme requires specification somewhere, but regularly rule-derived forms don't need to be specified.
>Of course in starting a new lexicon for a new language one >could easily have a field for "semantic transparency", >or perhaps an integral field indicating how many words >(or "lexical items") each entry counts for (1 for root words >and opaque compounds, 0 for irregular forms and transparent compounds; >1 for idioms and stock phrases?). > >On the other hand, transparency/opacity is a >continuous rather than a boolean quality. Some >"transparent" compounds are more tranparent >than others, some "opaque" compounds are more >opaque than others; and the same is true of >idiomatic phrases. So maybe the semantic transparency >field gets real numbers ranging from 0.0 to 1.0, and >the overall word count for the language would probably >be non-integral. > >On the gripping hand, maybe the "semantic transparency" >needs to be applied at the morpheme boundary level >rather than the word level. For instance, in E-o >"el-don-ej-o" there are three morpheme boundaries, >one perfectly transparent (ej-o), one somewhat >transparent (between el-don and -ej), and one >almost completely opaque (el-don). We might >assign them transparency (or rather opacity) >scores of > >el- don -ej -o > 0.95, 0.20, 0.0 > >or thereabouts. How would we combine these to >get an overall opacity score for the word? >Not by simply averaging them; "eldonejo" >is slightly more opaque than "eldoni". Nor >by adding, because we don't want a score >over 1.0. Another complicating factor is that >we don't want the presence of both >"eldoni" and "eldonejo" in the lexicon to inflate >the count too much since the latter builds on >the former and is almost transparent if you already >know "eldoni".
What's the problem here? Only the outermost opacity should count, if you assume the branching is binary so that there is an outermost derivational operation. In this case I gather the base of <eldonejo> is <eldoni>; so <eldon-> counts for 0.95 of a lexical item, <eldonej-> for 0.2, and <eldonejo> for none (if you reckon it in your count at all, which is a moot question). Overall, though, I like this idea of non-integral counting, making opacity ~ compositionality of a derivation, or listemicity of an item, a fuzzy concept. Now if only there were some way to systematically make statements like "the opacity of the derivation 'speak' > '(loud)speaker' is 0.6931"... Alex

Reply

Jim Henry <jimhenry1973@...>