Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: Unsupervised learning of natural languages

From:tomhchappell <tomhchappell@...>
Date:Thursday, November 3, 2005, 19:51
--- In conlang@yahoogroups.com, Gary Shannon <fiziwig@Y...> wrote:
> > --- Henrik Theiling <theiling@A...> wrote: > > <snip> > > > > > But the algorithm is limited to finding context free > > rules, so some > > things like vowel harmony or Werner's law etc. > > cannot be found. On > > the syntax level, the same holds for word ordering > > phenomena occuring > > in German or Dutch. In the computational > > linguistics fields, context > > free grammars are insufficient for virtually > > everything. So although > > the algorithms are fun to play with, they are not > > really innovative, I > > think, for linguistics. > > > > **Henrik > > > > My reading of the paper (see, in particular, Mode A > and Mode B. in the supporting text where the alogrithm > is detailed) indicates that the algorithm handles the > extraction of both context free and context sensitive > rules. > > Quote: "In particular, when ADIOS is iterated, symbols > that may have been initially very far apart are > allowed to exert influence on each other, enabling it > to capture long-range syntactic dependencies such as > agreement (Fig. 2F)." > > Inidcating that even long-range context is taken into > account. > > --gary >
Right. In Mode B, the string is treated as a new node only in contexts where MEX decides it is "significant"; or, two nodes are treated as equivalent only in contexts where MEX decides they are both "significant". So, in Mode B, the algorithm can be Context Sensitive. Also, the Context Window can be set wider. In their test runs, it was always 5 or less -- I guess that would be as if all their rewrite rules would have had, at maximum, the form ABCDE --> FGHIJ where each of those letters could be any arbitrary grammar symbol (or absent) -- a terminal or a non-terminal -- and not necessarily distinct from each other. But the Context Window Width was a parameter that could be reset at run-time. If it was set for 7 or more, I think it might start finding fairly "long-distance" dependencies. Tom H.C. in MI

Reply

Gary Shannon <fiziwig@...>