Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: Unsupervised learning of natural languages

From:Henrik Theiling <theiling@...>
Date:Wednesday, November 2, 2005, 21:59
Hi!

Gary writes:
> ... > Those rules could then be used generatively to create new words for > a conlang in the style of whatever vocabulary was supplied as the > training sample. Thus there could be generative rules to create > words with an Icelandic flavor, or words with a Tibetan flavor, or > words with a Korean flavor, etc. And by blending extracted rule > sets (or equivalently, blending input lexicons) rules could be found > for generating words with compound or hybrid flavors like > Russian-Japanese hybrid words, or Polynesian-Hungarian hybrid words. > > That sounds like an interesting project! >...
That would indeed be fun! :-)
> > Also, it would be interesting to see what it does for highly > > inflecting languages like Kalaallisut or Ancient Greek. If it > > fails here, too, the whole approach would not be too surprising at > > all, since one would naturally expect these things to fail. > > It looks to me as though this method would have no trouble with > inflected languages. The method extracts rules recursively starting > at the lowest level and re-writing the graph at a more abstract, or > generalized level before extracting rules at the next higher > level. It would have to build its initial graph on the basis of > individual letters, rather than individual words, however, so that > the first rules it extracted would be at the level of the inflection > rules. >...
But the algorithm is limited to finding context free rules, so some things like vowel harmony or Werner's law etc. cannot be found. On the syntax level, the same holds for word ordering phenomena occuring in German or Dutch. In the computational linguistics fields, context free grammars are insufficient for virtually everything. So although the algorithms are fun to play with, they are not really innovative, I think, for linguistics. **Henrik

Reply

Gary Shannon <fiziwig@...>