Re: Most developed conlang
From: | Alex Fink <a4pq1injbok_0@...> |
Date: | Monday, April 23, 2007, 6:09 |
On Sun, 22 Apr 2007 18:47:58 -0400, Jim Henry <jimhenry1973@...> wrote:
>On 4/22/07, Henrik Theiling <theiling@...> wrote:
>
>> So your main criterion would be predictability of semantics? If
>> predictable => no new word, if not predictable => new word. This
>> seems, well, very reasonable for composing a lexicon. Of course there
>> will be difficult cases, but let's ignore them for now.
>>
>> This means that for counting a conlang's words, we probably should:
>>
>> - also count phrases ('bubble sort algorithm') and idioms
>>
>> - not count lexicon entries that are due to irregular forms
>> ('saw' cf. 'see')
>> - count polysynthetically constructed words several times,
>> excluding structures that are semantically clear operations,
>> but counting all irregularly derived concepts
>
What you're proposing to count there seem to be essentially _listemes_
[Wiktionary def: (linguistics) An item that is memorized as part of a list,
as opposed to being generated by a rule.], except that suppletive and
'irregular' forms do count as listemes. But, afaik, it's debatable whether
strong verbs are really irregular in the relevant sense.
That's a perfectly reasonable criterion for counting, as I see it. In
particular it correlates pretty closely to the amount of work the conlanger
will have had to put in to designing the lexicon: each listeme requires
specification somewhere, but regularly rule-derived forms don't need to be
specified.
>Of course in starting a new lexicon for a new language one
>could easily have a field for "semantic transparency",
>or perhaps an integral field indicating how many words
>(or "lexical items") each entry counts for (1 for root words
>and opaque compounds, 0 for irregular forms and transparent compounds;
>1 for idioms and stock phrases?).
>
>On the other hand, transparency/opacity is a
>continuous rather than a boolean quality. Some
>"transparent" compounds are more tranparent
>than others, some "opaque" compounds are more
>opaque than others; and the same is true of
>idiomatic phrases. So maybe the semantic transparency
>field gets real numbers ranging from 0.0 to 1.0, and
>the overall word count for the language would probably
>be non-integral.
>
>On the gripping hand, maybe the "semantic transparency"
>needs to be applied at the morpheme boundary level
>rather than the word level. For instance, in E-o
>"el-don-ej-o" there are three morpheme boundaries,
>one perfectly transparent (ej-o), one somewhat
>transparent (between el-don and -ej), and one
>almost completely opaque (el-don). We might
>assign them transparency (or rather opacity)
>scores of
>
>el- don -ej -o
> 0.95, 0.20, 0.0
>
>or thereabouts. How would we combine these to
>get an overall opacity score for the word?
>Not by simply averaging them; "eldonejo"
>is slightly more opaque than "eldoni". Nor
>by adding, because we don't want a score
>over 1.0. Another complicating factor is that
>we don't want the presence of both
>"eldoni" and "eldonejo" in the lexicon to inflate
>the count too much since the latter builds on
>the former and is almost transparent if you already
>know "eldoni".
What's the problem here? Only the outermost opacity should count, if you
assume the branching is binary so that there is an outermost derivational
operation. In this case I gather the base of <eldonejo> is <eldoni>; so
<eldon-> counts for 0.95 of a lexical item, <eldonej-> for 0.2, and
<eldonejo> for none (if you reckon it in your count at all, which is a moot
question).
Overall, though, I like this idea of non-integral counting, making opacity ~
compositionality of a derivation, or listemicity of an item, a fuzzy
concept. Now if only there were some way to systematically make statements
like "the opacity of the derivation 'speak' > '(loud)speaker' is 0.6931"...
Alex
Reply