Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: XML for linguists?

From:David G. Durand <david@...>
Date:Friday, November 19, 1999, 0:32
DTDs (XML formats) for grammatical analysis are problematic because
linguists disagree so radically in their notions of what a proper
grammatical analysis is. So it's hard to get them to agree on a standard
representation for these things that relatively divergent properties
(depending on the theoretical orientation of the linguist).

The Text Encoding Initiative faced this problem (electronic version of the
guidelines is at http://etext.virginia.edu/TEI.html). The result was that
they created a fairly complex facility that allows one to declare a
linguistic representation, and then to mark up text in conjunction with
that declaration.

For simple projects, devising your own DTD might be simpler than using a
the fulle TEI mechanisms ("feature structures"). There are also simpler
tags that can be used to attach basic grammatical information like glosses
to a text (confusingly, grammatical items are called "tags" in the corpus
linguistics community).

There are various multilingual corpus tools available from the University
of Edinburgh, deriving from the MULTEXT project. Those tools deal with the
general problem of representing texts plus part of speech information plus
segmentations (sentence and clause level word groups) and alignments
(correlated segments in different language variants of a text). The tools
are generic XML tools, but the tags used in the project are variations of
the TEI tagset.

   -- David

>I'm wondering if there is or should be some kind of >XML definition for language parsing. > >http://www.w3.org/XML/ > >Anybody know anything about this? >What I vaguely have in mind is something like: > ><sentence> ><np case=subject> >... ></np> ><vp voice=antiantiantipassive> >... ></vp> ></sentence>
_________________________________________ David Durand dgd@cs.bu.edu \ david@dynamicDiagrams.com http://www.cs.bu.edu/students/grads/dgd/ \ Director of Development Graduate Student no more! \ Dynamic Diagrams --------------------------------------------\ http://www.dynamicDiagrams.com/ MAPA: mapping for the WWW \__________________________