Re: XML for linguists?
From: | Boudewijn Rempt <bsarempt@...> |
Date: | Wednesday, November 10, 1999, 11:32 |
On Wed, 10 Nov 1999, Don Blaheta wrote:
> Quoth Charles:
> > I'm wondering if there is or should be some kind of
> > XML definition for language parsing.
>
> Probably, yes.
>
One interesting article is in Nerbonne 1998, titled _Markup of a
Test Suite with SGML_, by Martin Volk. (The book itself, Linguistic
Databases, contains more interesting papers, and a few duds.)
> and this would in fact resolve one or two infelicities in the system
> having to do with null constituents (traces and the like). The problem
> is, even though this is much better for all the reasons XML usually is,
> it wouldn't be accepted because it would triple or quadruple the size of
> the corpus, for no "obvious" gain.
Well, it needn't - I wouldn't store the texts in XML form, but in a
relational database. XML texts can be readily mapped to a normalized
database, and then take far less space, and they can be extracted and
put into a DOM form just as easily as if if they would be read from
a text file in xml format.
> Also, there is some question of what
> level of information to put into the tag name and how much to leave in
> the arguments. That is,
> <constituent type="SINV" function="ADV">
> or
> <constituent type="S" subtype="INV" function="ADV">
> or
> <S subtype="INV" function="ADV">
> or
> <SINV function="ADV">
> ? In any case, I'll ask my advisor (and some other people around here)
> to see if any work in this direction has been done.
Yes, that's one of my quandaries (if that's the word I want), too. If I
normalize everything then I don't keep anything between the opening and
closing tags...
Boudewijn Rempt | http://denden.conlang.org/~bsarempt