Re: XML for linguists?
From: | Boudewijn Rempt <bsarempt@...> |
Date: | Tuesday, November 9, 1999, 21:10 |
On Tue, 9 Nov 1999, Charles wrote:
> I'm wondering if there is or should be some kind of
> XML definition for language parsing.
>=20
It's not the same, but I'm currently experimenting with the use of
xml for importing texts into, handling texts inside and exporting
texts out of Kura, based on my current datamodel.
Something like:
<?xml version=3D"1.0" encoding=3D"UTF-8" standalone=3D'yes' ?>
<!DOCTYPE kura_interlinear_text [
<!ELEMENT kura_interlinear_text (#PCDATA)>
]>
<kura_interlinear_text>
<text title=3D"Lamay Neranmen"=20
description=3D"Wander Song"=20
language=3D"denden">
<stream text=3D"Edo qoiqoi s=FCmzi nerananmen" language=3D"denden">
<e>edo
<tag name=3D"TR">my</tag>
<e>e
<tag name=3D"GL">poss</tag></e>
<e>do
<tag name=3D"GL">1sMGH</tag></e>
</e>
<e text=3D"qoiqoi">...</e>
<e text=3D"s=FCmzi">...</e>
<e text=3D"nerananmen">...</e>
</stream>
<stream text=3D"S=FCs=FC-=FCmen edi hod-atahl par" language=3D"denden">
</stream>
</text>
<text title=3D"Lama Hosame">
</text> =20
</kura_interlinear_text>
However, that makes for long documents, and it has little to do with
natural language parsing. Besides, I lack a good reference guide to
XML since O'Reilly has only a 100-page booklet, and I am loth to buy
from another publisher - so this xml text isn't actually valid :-(.
Taliessin pointed me to an interesting paper at www.sil.org about
interlinear glossing:
http://www.sil.org/silewp/1997/003/SILEWP1997-003.html
And they take a more line-oriented approach, tagging per line instead
of per element. However, parsing XML is really easy, as is using DOM
structures. Going from XML to HTML might be a bit more difficult - I
need to translate all the <e> elements with their <tag> sub-elements
into something parallel. My problem remains that there is a linear
flow of complex elements where each sub-element has its own place in
the flow.
Boudewijn Rempt | http://denden.conlang.org/~bsarempt