Conlang: Re: "Self-Segregating Syntax"? (And Rosta, Apr 23 '06, 16:15)

From:	And Rosta <and.rosta@...>
Date:	Sunday, April 23, 2006, 16:15

Eldin Raigmore, On 21/04/2006 19:57:

> On Fri, 21 Apr 2006 02:21:57 +0100, And Rosta <and.rosta@...> wrote: > [snip] >> As for the unambiguous parser, the details have naturally changed greatly >> over time, but there are some constants. >> I. I work with a Dependency Grammar model of syntax, in which syntactic >> structure is a tree (without crossing branches) and there is no >> distinction in type between furcating nodes, unary branching nodes and >> terminal nodes. (This is mainly a notational issue, but it makes for >> maximal simplicity & straightforwardness.) >> II. I stick to the principle of no lookahead, which helps to ensure that >> any parsing algorithm is straightforward & unmindboggling. >> III. Not all nodes need be expressed phonologically. >> >> In the antepenultimate incarnation of the syntax, all mothers preceded >> their daughters, and the lexicogrammar specified for each node how many >> daughters it has. This is a variety of your [3a] above. >> >> In the penultimate incarnation of the syntax, mothers could precede or >> follow their daughters. The lexicogrammar specified for each node how >> many daughters it has, and whether it follows its mother or has no >> mother or precedes its mother as first daughter or precedes its mother >> as nonfirst daughter. This is a mixture of your [1] & [3] (but with the >> 'length' encoded on what you might call the 'head'). >> >> In the current incarnation of the syntax, mothers follow daughters, all >> mothers have exactly two daughters, and mothers form a closed lexical >> class (which happens to be expressed inflectionally). This could be >> classed as your [2] or your [3]. >> >> Anyway, it should be clear from the above that the aim is always to >> find an unambiguous algorithm for building a tree without lookahead. >> And the solution always involves a combination of constraints on tree >> shape plus lexicogrammatical information about the combinatorial >> properties of individual nodes. It's easy to find algorithms that work: >> the design challenge is to find the optimal solution (which needs to >> factor in compositional semantics).

[...]

> I wish I could see it in operation;

It operates only in my head... No doubt I could fetch examples out of my head, though...

> also I'd like a little more detail about the generalities of > your techniques you mentioned above.

I'm not sure exactly what you're asking for. Give me some idea about what counts as "details about the generalities", and I will endeavour to oblige...

> What is Dependency Grammar, exactly? and what publications describe it > best? and can you detail a little better, perhaps with some examples, why > it helps out on this question?

Some quick googling reveals online expositions to be surprisingly sparse. The first few paragraphs of http://www.ilc.cnr.it/EAGLES96/synlex/node15.html have a little. Also http://en.wikipedia.org/wiki/Link_grammar, a version of DG, which is actually closer to current Livagian syntax (in which the mother--daughter asymmetry is actually redundant). Googling shows that DG is especially popular with people doing parsing, because it lends itself to simple algorithms & does away with lots of representational cruft. To get a sense of a fairly comprehensively worked out DG of English, I recommend Word Grammar (-- which is the theory I grew up in and is ancestral to what I currently do (professionally, I mean, not conlinguistically)). An into page: http://www.phon.ucl.ac.uk/home/dick/wg.htm Intro page to a Word Grammar encyclopedia: http://www.phon.ucl.ac.uk/home/dick/enc-gen.htm Syntax section of the encyclopedia [I recommend a browse through this]: http://www.phon.ucl.ac.uk/home/dick/enc/syntax.htm I'm not aware of any book treatments of DG that are good enough to warrant you seeking them out in the library. That's really because there's not much to DG, since it is (largely) a notational variant of phrase structure grammar.

> ----- > > In Category Grammar, which I understand is "equivalent", somehow, to Tree- > Adjoining Grammar,

"Categorial Grammar"? (There is also a version of Systemic Grammar called "Scale and Category Grammar", but this doesn't sound like what you're describing.)

> a member of a non-elementary category is an operator > that intakes a fixed-length list of fixed-position operands, each of a > particular previously-defined category, and outputs a member of some > previously-defined category.

This is broadly how DG works.

> In natlangs, it appears that the most popular positions for the operator > (within the list of operands), are the following: > 1. Immediately before the first operand. > 2. Immediately after the first operand. > 3. Immediately before the last operand. > 4. Immediately after the last operand. > > Position 1 is "prefix position", also known as "Polish Notation". > Position 4 is "postfix position", also known as "Reverse Polish Notation". > Positions 2 and 3 are "infix position". > > Obviously: > if there is only one operand, then > Position 1 = Position 3 and Position 2 = Position 4; > and if there are only two operands, then > Position 2 = Position 3. > > Also, unless there are more than three operands, there is no position which > is _not_ on the above list. > > ----- > > The position that the operator takes among its operands, is part of the > definition of the operator-type; as is the number of its operands, and as > are the types of its operands. > > It sounds like you're saying that in your conlangs these facts about the > operator-type are always "phonologically coded" into the word for a > particular operator. > > Have I understood you?

Pretty much. They're not necessarily directly phonologically coded (e.g. with a morpheme meaning "has 3 operands"), but the phonological form serves as the address for an entry in the lexicon, and the lexical entry will say "has 3 operands". None of my syntaxes have ever bothered with encoding the type of the operands. Of the three syntaxes I described above, the first had operators always before the first operand, and the operator encoded the number of operands. The second had the operator (still encoding the number of operands) freely ordered relative to the operands, but encoded on the operand its relation to the operator). And in the current, all operators have two operands and follow their operands, so only simple operatorhood needs to be encoded. --And.