Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: XML for linguists?

From:Brook Conner <nellardo@...>
Date:Tuesday, November 9, 1999, 15:10
Charles writes:
 > I'm wondering if there is or should be some kind of
 > XML definition for language parsing.

You mean to mark up text according to the grammar of the language that
the text is in?

 > http://www.w3.org/XML/
 >
 > Anybody know anything about this?

I'm sure there's lots of people on the list who know more than I,
though I don't consider myself exactly a slouch in this area - David
Durand comes to mind, as someone who's spent lots of time with the
Brown CHUG (computers in the humanities users' group) and the Brown
STG (scholarly technology group). Those groups are probably one of the
highest densities of SGML/XML experts in the country.

 > What I vaguely have in mind is something like:
 >
 > <sentence>
 > <np case=subject>
 > ...
 > </np>
 > <vp voice=antiantiantipassive>
 > ...
 > </vp>
 > </sentence>

There's a lot of issues to be decided to define that kind of DTD
(document type description). And unfortunately, many of the issues
would be best resolved in different ways for different languages. The
no-brainer example that I can think of is lojban, where neither "verb
phrase" or "noun phrase" really apply.

Even without deliberately different languages like lojban, you still
run into trouble.  For instance, which grammar are you using? This can
be a problem even within a language. Most spoken languages have
ambiguity (you can account for that in the DTD, but you have to be
thinking of it when you design the DTD). And different grammars might
conceivably parse the same sentence in different ways.

Just a beginning of some of the questions that come up in document
markup.


Brook

---------
CONGRESS.SYS Corrupted:  Re-boot Washington D.C (Y/n)?

---------
Fancy. Myth. Magic.
http://www.concentric.net/~nellardo/