Different Words with Large Common Substrings
|From:||Eldin Raigmore <eldin_raigmore@...>|
|Date:||Monday, October 13, 2008, 19:41|
Inspired by this:
|And Rosta's Livagian uses another method which, though not a self-
|segregating morphology in the strict sense, partly serves the same
|purpose with less restriction in the phonological shape of words. It
|requires a full knowledge of the lexicon to parse unambiguously, however.
|The key is that no actual morpheme must look like a prefix or suffix
|substring of another actual morpheme. So, for instance, if in a
|string "kesumalipe" you recognize "kesu" and "pe" as familiar morphemes,
|you know that this must be "kesu" followed by "ma li" or "mali" followed
|by "pe"; the fact that "kesu" is a real morpheme in a language meeting
|this criterion means that there cannot be another morpheme "kesuma"
|or "kesumali", and there can't be any morpheme like "lipe" or "malipe".
|But if you have only learned the phonology of the language and don't know
|much vocabulary yet, you can't deduce the morpheme boundaries from the
|phonotactics of the word; you would have to start by
|looking up "k" in the lexicon, then "ke", then "kes", until you
|find "kesu"; then start looking for "m", "ma", etc.
|Retrieved from "http://conlang.wikia.com/wiki/List_of_self-
I have been considering rules roughly similar to:
"No two distinct words can have a common initial substring which is longer
than half the length of one of them and longer than one-third the lengths of
the other; nor can any two distinct words have a common final substring
which is longer than half as long as one and longer than one-third as long as
("Length" may be measured in number of segments, or in number of morae, or
in number of syllables, or in number of feet, depending.)
The thing is, of course, this means that sets of words like:
It also makes sets like
Maybe it should be that "no 'finite' or 'surface' or 'fully-inflected' word should
be an initial nor final substring of any (other) _morpheme_" or "... of any
(other) _root_"? (I include "other" because maybe the root form of the word
can occur as a surface form.)
That's a less widely-applicable, hence more permissive rule. For one thing, the
substring has to be _all_ of one of the comparands.
I was also thinking; what if the substring had to be initial or final in only one of
the comparands, but could be medial in the other? Something like
"If any two words share a substring which is an initial or final substring of at
least one of them, and also is (over) half as long as at least one of them and
also (over) one-third as long as the other, then either one of the words is
inflected or derived from the other, or both words are inflected or derived from
the same baseword/wordbase/stem/root."
This modification could still cause difficulties with some sets of compounds;
say, "whiteboard" and "blackboard" and "blackguard".
What sort of rule, similar to one or more of those I've mentioned so far,
actually work for some natlang?
In theory, there's no difference between theory and practice.
In practice, there is.