Conlang: Re: "Self-Segregating Syntax"? (And Rosta, Apr 21 '06, 1:22)

Re: "Self-Segregating Syntax"?

From:	And Rosta <and.rosta@...>
Date:	Friday, April 21, 2006, 1:22

From:

And Rosta <and.rosta@...>

Date:

Friday, April 21, 2006, 1:22

Eldin Raigmore, On 18/04/2006 23:19:

> It looks like there are the following three main ways to delimit the > phrases or other word-groups (some systems use more than one at the same > time); > > 1. Mark the beginning of every such group -- the group will then end just > before the next beginning-of-a-group at the same or a higher level. > > 2. Mark the end of every such group -- the group will then begin just after > the last previous end-of-a-group at the same or a higher level. > > 3. Encode the length of the group at one (or both) of its margins: Either > 3a At the beginning of every group; or, > 3b At the end of every group; > or both. > > Has anyone come up with any other ideas? Or run into ideas someone else > has come up with? > > Has anyone gotten any further than X-1 on any such scheme? Or does anyone > know of any natlang or successful conlang (possibly someone else's) which > is more complete in this regard?

& Eldin Raigmore, On 19/04/2006 00:51:

> On Mon, 17 Apr 2006 19:16:45 +0100, And Rosta <and.rosta@...> wrote: > [snip] >> My conlang, Livagian, has unambiguous syntax parsed >> incrementally with no lookahead,

[...]

> > This is extremely interesting. > > Can you point me to the info on Livagian? Especially the syntax, > especially the parts that make it "unambiguous"? > > Is there a place I can look at your "unambiguous parser" too, if you have > one?

Nothing on Livagian is published, largely because ever since its inception almost 30 years ago it has been in a perpetual state of redesign as I find new ways to improve it (which invariably entail the destruction of most of the work done up to that point...). As for the unambiguous parser, the details have naturally changed greatly over time, but there are some constants. I. I work with a Dependency Grammar model of syntax, in which syntactic structure is a tree (without crossing branches) and there is no distinction in type between furcating nodes, unary branching nodes and terminal nodes. (This is mainly a notational issue, but it makes for maximal simplicity & straightforwardness.) II. I stick to the principle of no lookahead, which helps to ensure that any parsing algorithm is straightforward & unmindboggling. III. Not all nodes need be expressed phonologically. In the antepenultimate incarnation of the syntax, all mothers preceded their daughters, and the lexicogrammar specified for each node how many daughters it has. This is a variety of your [3a] above. In the penultimate incarnation of the syntax, mothers could precede or follow their daughters. The lexicogrammar specified for each node how many daughters it has, and whether it follows its mother or has no mother or precedes its mother as first daughter or precedes its mother as nonfirst daughter. This is a mixture of your [1] & [3] (but with the 'length' encoded on what you might call the 'head'). In the current incarnation of the syntax, mothers follow daughters, all mothers have exactly two daughters, and mothers form a closed lexical class (which happens to be expressed inflectionally). This could be classed as your [2] or your [3]. Anyway, it should be clear from the above that the aim is always to find an unambiguous algorithm for building a tree without lookahead. And the solution always involves a combination of constraints on tree shape plus lexicogrammatical information about the combinatorial properties of individual nodes. It's easy to find algorithms that work: the design challenge is to find the optimal solution (which needs to factor in compositional semantics). --And.