Re: Unsupervised learning of natural languages
From: | Henrik Theiling <theiling@...> |
Date: | Wednesday, November 2, 2005, 13:57 |
Hi!
Sanghyeon Seo <sanxiyn@...> writes:
> I thought people on this list may be interested in the following paper:
>
>
http://www.pnas.org/cgi/content/short/102/33/11629
>
http://www.cs.tau.ac.il/~ruppin/pnas_adios.pdf/
>
> Unsupervised learning of natural languages
> Zach Solan, David Horn, Eytan Ruppin, and Shimon Edelman
>
> This inducts grammar rule from raw data (unsegmented writing,
> continuous speech, etc.), and is also generative and predictive. The
> algorithm is also believed to be linear, thus computationally
> feasible.
Interesting, I will have to read that.
> Applying this to your conlang and generating few sentences may be an
> interesting experience... If someone can implement this.
Indeed! :-)
One phrase struck me as strange: '... It has been evaluated on
artificial context-free grammars with thousands of rules, on natural
languages as diverse as English and Chinese, ...'
Hmm! These two languages may not be related, but both have relatively
nice syntactic structures, i.e., a tree structure. So 'diverse' is a
euphemism it that sentence. It would be much more interesting to see
whether the approach works for, say, Dutch. If the algorithm tries to
find context-free production rules, it will fail.
Also, it would be interesting to see what it does for highly
inflecting languages like Kalaallisut or Ancient Greek. If it fails
here, too, the whole approach would not be too surprising at all,
since one would naturally expect these things to fail.
But, ok, these thoughts are premature -- I haven't read the article
yet.
**Henrik
Reply