Re: General Purpose Dictionary Generator
From: | Alex Fink <a4pq1injbok_0@...> |
Date: | Friday, October 27, 2006, 9:18 |
On Thu, 26 Oct 2006 01:17:52 -0700, Arthaey Angosii <arthaey@...> wrote:
>On 10/26/06, Alex Fink <a4pq1injbok_0@...> wrote:
>> This is a great idea. I'd actually been recently thinking that I should
>> convert my conlangs' lexica to a more structured format (they're currently
>> in un-marked-up and inconsistently-formatted human-readable text files) so I
>> could process them by computer; this would be perfect for that.
>
>It's also similar to my own conversion from Shoebox to XML. I'm
>mid-conversion, but I do have an XML Schema. Perhaps it can be used as
>a basis for this program, or at least to spur discussion? In either
>capacity, it might prove helpful.
I'd forgotten about Shoebox; it might be a good idea for this program to
accept Shoebox input in some form, perhaps by first running it through a
converter like (or identical to?) yours.
I remember getting the impression the last time I looked at Shoebox's format
that it was interlinear-centric (which makes sense). IIRC the main
definition field is the gloss, suitable for interlinear use, and of course
you can also have a proper definition and more explicatory notes but it's
the gloss that's primary. It looks like your schema follows this, and
Gary's proposals seem to have a similar leaning ("hazy" as definition,
"marked by the presence of haze" as note). My own preference would be to
make the longer definition primary and the gloss/metalanguage search key
secondary; this way it's the language-internal divisions of semantic space
and not the equivalences to some other language that are at the forefront.
>The schema itself supports much more than shown in the example:
>multiple pronuncation schemes, definitions in addition to short
>glosses, semantic domains, multiple example sentences,
>cross-references (such as synonyms), notes, subentries, and senses.
These probably resolve to questions about Shoebox rather than your own
designs, but:
- why are etymologies cross-references? If ancestral words have their own
entries at all, wouldn't they be in a completely different file?
Probably 'synchronic derivation' and 'diachronic etymology' should be
different fields.
- is there provision for differentiating word class from, um, word subclass,
from morphological information? Like "noun, masculine, /nd/-stem", or
"verb, subject is patientive, third conjugation"?
>Below my signature (for easy skipping) is the 193-line schema file. (I
>would have attached it and not bothered those not interested, but I
>assume the listserv kills attachments.)
>
>Please also note that it's my first schema, and as such I may have
>done things in less-than-optimal ways just to get it to validate. :P
You could've fooled me! I hadn't actually seen an XML Schema before this one.
Looking through your tech page it looks like you've actually got a number of
components of what Gary's planning to write. Are they very
Asha'ille-specific and specific to your formatting, or could they
generalise? We might have starting points for a number of aspects of the
project at hand, in the latter case.
Alex
Replies