Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: XML for linguists?

From:Boudewijn Rempt <bsarempt@...>
Date:Wednesday, November 10, 1999, 11:32
On Wed, 10 Nov 1999, Don Blaheta wrote:

> Quoth Charles: > > I'm wondering if there is or should be some kind of > > XML definition for language parsing. > > Probably, yes. >
One interesting article is in Nerbonne 1998, titled _Markup of a Test Suite with SGML_, by Martin Volk. (The book itself, Linguistic Databases, contains more interesting papers, and a few duds.)
> and this would in fact resolve one or two infelicities in the system > having to do with null constituents (traces and the like). The problem > is, even though this is much better for all the reasons XML usually is, > it wouldn't be accepted because it would triple or quadruple the size of > the corpus, for no "obvious" gain.
Well, it needn't - I wouldn't store the texts in XML form, but in a relational database. XML texts can be readily mapped to a normalized database, and then take far less space, and they can be extracted and put into a DOM form just as easily as if if they would be read from a text file in xml format.
> Also, there is some question of what > level of information to put into the tag name and how much to leave in > the arguments. That is, > <constituent type="SINV" function="ADV"> > or > <constituent type="S" subtype="INV" function="ADV"> > or > <S subtype="INV" function="ADV"> > or > <SINV function="ADV"> > ? In any case, I'll ask my advisor (and some other people around here) > to see if any work in this direction has been done.
Yes, that's one of my quandaries (if that's the word I want), too. If I normalize everything then I don't keep anything between the opening and closing tags... Boudewijn Rempt | http://denden.conlang.org/~bsarempt