From: | David G. Durand <david@...> |
---|---|
Date: | Friday, November 19, 1999, 0:32 |
DTDs (XML formats) for grammatical analysis are problematic because linguists disagree so radically in their notions of what a proper grammatical analysis is. So it's hard to get them to agree on a standard representation for these things that relatively divergent properties (depending on the theoretical orientation of the linguist). The Text Encoding Initiative faced this problem (electronic version of the guidelines is at http://etext.virginia.edu/TEI.html). The result was that they created a fairly complex facility that allows one to declare a linguistic representation, and then to mark up text in conjunction with that declaration. For simple projects, devising your own DTD might be simpler than using a the fulle TEI mechanisms ("feature structures"). There are also simpler tags that can be used to attach basic grammatical information like glosses to a text (confusingly, grammatical items are called "tags" in the corpus linguistics community). There are various multilingual corpus tools available from the University of Edinburgh, deriving from the MULTEXT project. Those tools deal with the general problem of representing texts plus part of speech information plus segmentations (sentence and clause level word groups) and alignments (correlated segments in different language variants of a text). The tools are generic XML tools, but the tags used in the project are variations of the TEI tagset. -- David>I'm wondering if there is or should be some kind of >XML definition for language parsing. > >http://www.w3.org/XML/ > >Anybody know anything about this? >What I vaguely have in mind is something like: > ><sentence> ><np case=subject> >... ></np> ><vp voice=antiantiantipassive> >... ></vp> ></sentence>_________________________________________ David Durand dgd@cs.bu.edu \ david@dynamicDiagrams.com http://www.cs.bu.edu/students/grads/dgd/ \ Director of Development Graduate Student no more! \ Dynamic Diagrams --------------------------------------------\ http://www.dynamicDiagrams.com/ MAPA: mapping for the WWW \__________________________