Re: "Self-Segregating Syntax"?
From: | And Rosta <and.rosta@...> |
Date: | Friday, April 21, 2006, 1:22 |
Eldin Raigmore, On 18/04/2006 23:19:
> It looks like there are the following three main ways to delimit the
> phrases or other word-groups (some systems use more than one at the same
> time);
>
> 1. Mark the beginning of every such group -- the group will then end just
> before the next beginning-of-a-group at the same or a higher level.
>
> 2. Mark the end of every such group -- the group will then begin just after
> the last previous end-of-a-group at the same or a higher level.
>
> 3. Encode the length of the group at one (or both) of its margins: Either
> 3a At the beginning of every group; or,
> 3b At the end of every group;
> or both.
>
> Has anyone come up with any other ideas? Or run into ideas someone else
> has come up with?
>
> Has anyone gotten any further than X-1 on any such scheme? Or does anyone
> know of any natlang or successful conlang (possibly someone else's) which
> is more complete in this regard?
& Eldin Raigmore, On 19/04/2006 00:51:
> On Mon, 17 Apr 2006 19:16:45 +0100, And Rosta <and.rosta@...> wrote:
> [snip]
>> My conlang, Livagian, has unambiguous syntax parsed
>> incrementally with no lookahead,
[...]
>
> This is extremely interesting.
>
> Can you point me to the info on Livagian? Especially the syntax,
> especially the parts that make it "unambiguous"?
>
> Is there a place I can look at your "unambiguous parser" too, if you have
> one?
Nothing on Livagian is published, largely because ever since its
inception almost 30 years ago it has been in a perpetual state of redesign
as I find new ways to improve it (which invariably entail the destruction
of most of the work done up to that point...).
As for the unambiguous parser, the details have naturally changed greatly
over time, but there are some constants.
I. I work with a Dependency Grammar model of syntax, in which syntactic
structure is a tree (without crossing branches) and there is no
distinction in type between furcating nodes, unary branching nodes and
terminal nodes. (This is mainly a notational issue, but it makes for
maximal simplicity & straightforwardness.)
II. I stick to the principle of no lookahead, which helps to ensure that
any parsing algorithm is straightforward & unmindboggling.
III. Not all nodes need be expressed phonologically.
In the antepenultimate incarnation of the syntax, all mothers preceded
their daughters, and the lexicogrammar specified for each node how many
daughters it has. This is a variety of your [3a] above.
In the penultimate incarnation of the syntax, mothers could precede or
follow their daughters. The lexicogrammar specified for each node how
many daughters it has, and whether it follows its mother or has no
mother or precedes its mother as first daughter or precedes its mother
as nonfirst daughter. This is a mixture of your [1] & [3] (but with the
'length' encoded on what you might call the 'head').
In the current incarnation of the syntax, mothers follow daughters, all
mothers have exactly two daughters, and mothers form a closed lexical
class (which happens to be expressed inflectionally). This could be
classed as your [2] or your [3].
Anyway, it should be clear from the above that the aim is always to
find an unambiguous algorithm for building a tree without lookahead.
And the solution always involves a combination of constraints on tree
shape plus lexicogrammatical information about the combinatorial
properties of individual nodes. It's easy to find algorithms that work:
the design challenge is to find the optimal solution (which needs to
factor in compositional semantics).
--And.