Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: Unsupervised learning of natural languages

From:tomhchappell <tomhchappell@...>
Date:Friday, November 4, 2005, 22:01
--- In conlang@yahoogroups.com, Henrik Theiling <theiling@A...> wrote:
> [snip] > ... The six given languages have a relatively > context-free syntax structure with nicely embedded sub-phrases. I > merely said I would have been more surprised of a working algorithm > if they had tested a more interesting language. E.g. Dutch, which > has a very funny verb order in embedded phrases: > [snip] > ... > ... the final structure contains the subjects in a row > followed by the verbs in the same order. For arbitrarily deep > nesting, this cannot be generated with a context-free grammar. > Further, with a given context length, you can only generate a fixed > number of reversals, so I think the grammar structure they are > generating is just not suited for Dutch und thus for natural > language in general... > [snip] > I think production and rewriting rules are not the perfect means for > natural language processing, since even context free grammars are > too much by allowing arbitrary nesting, which the human brain > doesn't, while on the other hand, they are too restricted for > structures like in Dutch. > Further, there are language with free word order so even searching > syntax rules for the order of words is an algorithmic guide-line and > thus a supervision. > [snip]
From my reading of the paper, I do not see why the algorithm is limited to grammars governable by rewrite rules, whether context-free or context-sensitive. (BTW I and at least one other poster were trying to make the point that the algorithm can, specifically, handle context-sensitive grammars; it is still somewhat fuzzy, Henrik, whether or not you got that point.) What is clear, to me, from reading the paper, are the following two things: 1) The things discovered by the algorithm follow the Bloomfieldian methodology and model of * disocevery procedures, * form classes, * distributional equivalence, and * frames; and, 2) The internal structures in the grammar discovered by the algorithm will be tree-like. Henrik, I think your point about "... any fixed context length ... (etc.)" is a valid one. Do you think Dutch, or any other natlang, is not tree-like? Perhaps the "non-configurational" languages? What interested me most about all of these papers, including this one, was the fact that Bloomfield's program had never been successfully applied to any natlang by hand within a human lifespan. These "feasible learnability" notions seem to make it possible to limit the nature of possible grammars to something that can be learned ("by hand") within a few years. Tom H.C. in MI

Reply

Henrik Theiling <theiling@...>