Re: Most developed conlang
From: | Alex Fink <a4pq1injbok_0@...> |
Date: | Wednesday, April 25, 2007, 17:13 |
On Wed, 25 Apr 2007 13:36:30 +0200, Henrik Theiling <theiling@...> wrote:
>> > > el- don -ej -o
>> > > 0.95, 0.20, 0.0
>> ....
>> > > or thereabouts. How would we combine these to
>> > > get an overall opacity score for the word?
>>
>> > The total score should of course be the product of those values, since
>> > from the core pieces, each level of opaqueness influences the
>> > opaqueness of the whole by its morpheme boundary level.
>>
>> But multiplying the nonzero values would give a lower opacity
>> score for "eldonejo" than for "eldoni", when in
>> fact "eldonejo" is slightly more opaque than "eldoni".
>> And if we multiply all values then any word that has at least
>> one perfectly transparent morpheme boundary
>> would get a perfectly-transparent opacity score of 0!
>
>Oops! That's not want I wanted.
>
>> Maybe it would be better to multiply the
>> _transparency_ scores rather than _opacity_ scores,
>>
>> (1 - n_0) * ( 1 - n_1) * (1 - n_2 )....
>> in this case,
>> (1 - 0.95) * ( 1 - 0.20 ) * ( 1 - 0 )
>> = 0.05 * 0.80
>> = 0.04 (transaprency)
>>
>> and then subtract that from 1 to get its
>> opacity score, = 0.96.
>
>You are absolutely right, that's much more sensible. I had actually
>mixed up the two levels.
Yeah, I think that's pretty clearly the Right Thing to do to measure the
aggregate opacity of one word. If opacity scores are supposed to be like
probabilities of not guessing the meaning of each derivation, then this
gives us the probability of guessing wrong at at least one derivation step.
I'm not sure what the use of aggregate opacity scores for single words is --
not lexicon-counting anymore -- but as you say:
>But we might agree that this type of math may be mainly for fun
>anyway. :-)
>
>> > > "eldoni" and "eldonejo" in the lexicon to inflate
>> > > the count too much since the latter builds on
>> > > the former and is almost transparent if you already
>> > > know "eldoni". ...
>> >
>> > This is more tricky, yes. In the lexicon an Ãrjótrunn, I have an
>> > operation that cuts off parts of an existing entry for construction of
>> > a new one. Maybe that would be feasible?
>>
>> Can you clarify further?
>
>Well, it is currently a simple string operation -- not linguistically
>founded, but still helpful for linguistics: you could chop off the
>last three characters of 'eldonejo' and use the stub 'eldon' for
>further operations.
This just sounds to me as if you have a citation form which is larger than
the actual stem from which derivations take place. I'd take the stem of the
word to be the base of all the derivations, then, and assign an opacity of 0
to the derivation that yields the citation form.
Or are you talking about back-formation?
>> I think Alex Fink's suggestions were probably
>> along the right lines, at least vis-a-vis lexicon
>> counting: count only the outermost branching.
>
>But when the result of previous branching steps are not part of the
>lexicon, e.g. because two morphemes are added to form a new word while
>adding only the first one leaves you with garbage, then it's not the
>best way, I think. However, I would propose to multiply all
>boundaries not resulting in anything already in the lexicon so that
>you get a recursive derivation tree.
>
>E.g. if you have ABC in the lexicon already and want to add ABCDE and
>if ABCD does not exist, the either assign the operation +DE one score
>and use this for a lexicon entry, or multiply the scores of +D and +E.
Right, that's why I stuck a clause in my first message saying
| ... if you assume the branching is binary so that there is an outermost
| derivational operation.
It's really derivational operations we're counting, not boundaries. There
might not be any segmental material corresponding to some derivation, or
there might be discontiguity going on; you'll agree that if there's a single
derivation whose form is, say, an (undecomposable) circumfix, it's
meaningless to count the prefix and the suffix separately for opacity.
But that's probably not what you meant; you were talking about a case where
+D and +E are normally completely independent processes, but the word ABCD
forms a gap and just happens not to exist, while ABCDE does occur? As if,
for example, in English "derive" and "derivational" were both valid words
but *"derivation" was unexpectedly not found? It's not obvious to me what
to do then: just multiplying the scores of D and E is one idea, but the fact
that there is a gap is pretty surprising, and so maybe that should count for
something.
Can anyone cite any (nat- or con-)instances of this sort of situation?
Alex
Reply