Re: Unsupervised learning of natural languages
From: | tomhchappell <tomhchappell@...> |
Date: | Friday, November 4, 2005, 22:01 |
--- In conlang@yahoogroups.com, Henrik Theiling <theiling@A...> wrote:
> [snip]
> ... The six given languages have a relatively
> context-free syntax structure with nicely embedded sub-phrases. I
> merely said I would have been more surprised of a working algorithm
> if they had tested a more interesting language. E.g. Dutch, which
> has a very funny verb order in embedded phrases:
> [snip]
> ...
> ... the final structure contains the subjects in a row
> followed by the verbs in the same order. For arbitrarily deep
> nesting, this cannot be generated with a context-free grammar.
> Further, with a given context length, you can only generate a fixed
> number of reversals, so I think the grammar structure they are
> generating is just not suited for Dutch und thus for natural
> language in general...
> [snip]
> I think production and rewriting rules are not the perfect means for
> natural language processing, since even context free grammars are
> too much by allowing arbitrary nesting, which the human brain
> doesn't, while on the other hand, they are too restricted for
> structures like in Dutch.
> Further, there are language with free word order so even searching
> syntax rules for the order of words is an algorithmic guide-line and
> thus a supervision.
> [snip]
From my reading of the paper, I do not see why the algorithm is
limited to grammars governable by rewrite rules, whether context-free
or context-sensitive. (BTW I and at least one other poster were
trying to make the point that the algorithm can, specifically, handle
context-sensitive grammars; it is still somewhat fuzzy, Henrik,
whether or not you got that point.)
What is clear, to me, from reading the paper, are the following two
things:
1) The things discovered by the algorithm follow the Bloomfieldian
methodology and model of
* disocevery procedures,
* form classes,
* distributional equivalence, and
* frames; and,
2) The internal structures in the grammar discovered by the algorithm
will be tree-like.
Henrik, I think your point about "... any fixed context length ...
(etc.)" is a valid one.
Do you think Dutch, or any other natlang, is not tree-like? Perhaps
the "non-configurational" languages?
What interested me most about all of these papers, including this one,
was the fact that Bloomfield's program had never been successfully
applied to any natlang by hand within a human lifespan.
These "feasible learnability" notions seem to make it possible to
limit the nature of possible grammars to something that can be
learned ("by hand") within a few years.
Tom H.C. in MI
Reply