Re: Word class marking in the wild...
|From:||Gary Shannon <fiziwig@...>|
|Date:||Tuesday, November 18, 2008, 18:47|
--- On Tue, 11/18/08, ROGER MILLS <rfmilly@...> wrote:
> I probably shouldn't say this, but I simply don't
> understand that article.
> How, for example, is "insect" considered
> "verb-like" vs. "noun-like"
> marble?? It escapes me.
"Insect" is closer in phonetic space to where the verbs tend to cluster than to where
the nouns tend to cluster. It's not a thing that can be described in words,
because it's a mathematical feature that only emerges after the sound of the
words is subjected to multi-dimensional feature-space mapping. So asking why
"insect" is more verb-like is asking a question that can only be answered by
referencing that feature space.
There is no sentence like "Verbs are more blah blah blah than nouns which tend to
foo foo foo." Trying to state the differences in a non-mathematical way only
ends up with meaningless statements.
In other words, it escapes you because you are not looking at the
multi-dimensional scatter diagram, you are looking for an understanding built
out of words, which doesn't exist. The words of the article don't really tell
us what the difference IS, but only tell us that there is a difference, and how
to look for the difference in the multi-dimensional scatter diagrams.
However, on the off chance that a cruder measure might suffice, I opened up my English
lexicon database and did some searches. Here why my results also suggest that
"insect" is more verb-like:
Counting only nouns and verbs, in English there are (according to my lexicon database)
46 words ending in "-ect". 57% are unambiguously verbs, 22% are unambiguously
nouns, and the rest can be either nouns or verbs. If we count only the
unambiguous words, then 72% of non-ambiguous "-ect" words are verbs. By that
measure, words ending in "-ect" are more "verb-like".
Now imagine less easily described features, features which are measured on a
sliding scale, rather than a simple binary yes/no, and plot each such feature
on a separate axis of a multi-dimension scatter diagram. Even without seeing
the diagram, it should be clear that it is possible that clusters of points in
n-space exist where verbs predominate, and that "insect" is within, or near to
one of those verb-like clusters.