Re: Word class marking in the wild...

> > > I probably shouldn't say this, but I simply don't > understand that article. > How, for example, is "insect" considered > "verb-like" vs. "noun-like" > marble?? It escapes me.
"Insect" is closer in phonetic space to where the verbs tend to cluster than to where the nouns tend to cluster. It's not a thing that can be described in words, because it's a mathematical feature that only emerges after the sound of the words is subjected to multi-dimensional feature-space mapping. So asking why "insect" is more verb-like is asking a question that can only be answered by referencing that feature space. There is no sentence like "Verbs are more blah blah blah than nouns which tend to foo foo foo." Trying to state the differences in a non-mathematical way only ends up with meaningless statements. In other words, it escapes you because you are not looking at the multi-dimensional scatter diagram, you are looking for an understanding built out of words, which doesn't exist. The words of the article don't really tell us what the difference IS, but only tell us that there is a difference, and how to look for the difference in the multi-dimensional scatter diagrams. However, on the off chance that a cruder measure might suffice, I opened up my English lexicon database and did some searches. Here why my results also suggest that "insect" is more verb-like: Counting only nouns and verbs, in English there are (according to my lexicon database) 46 words ending in "-ect". 57% are unambiguously verbs, 22% are unambiguously nouns, and the rest can be either nouns or verbs. If we count only the unambiguous words, then 72% of non-ambiguous "-ect" words are verbs. By that measure, words ending in "-ect" are more "verb-like". Now imagine less easily described features, features which are measured on a sliding scale, rather than a simple binary yes/no, and plot each such feature on a separate axis of a multi-dimension scatter diagram. Even without seeing the diagram, it should be clear that it is possible that clusters of points in n-space exist where verbs predominate, and that "insect" is within, or near to one of those verb-like clusters. --gary