Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: General Purpose Dictionary Generator

From:Alex Fink <a4pq1injbok_0@...>
Date:Friday, October 27, 2006, 9:18
On Thu, 26 Oct 2006 01:17:52 -0700, Arthaey Angosii <arthaey@...> wrote:

>On 10/26/06, Alex Fink <a4pq1injbok_0@...> wrote: >> This is a great idea. I'd actually been recently thinking that I should >> convert my conlangs' lexica to a more structured format (they're currently >> in un-marked-up and inconsistently-formatted human-readable text files) so I >> could process them by computer; this would be perfect for that. > >It's also similar to my own conversion from Shoebox to XML. I'm >mid-conversion, but I do have an XML Schema. Perhaps it can be used as >a basis for this program, or at least to spur discussion? In either >capacity, it might prove helpful.
I'd forgotten about Shoebox; it might be a good idea for this program to accept Shoebox input in some form, perhaps by first running it through a converter like (or identical to?) yours. I remember getting the impression the last time I looked at Shoebox's format that it was interlinear-centric (which makes sense). IIRC the main definition field is the gloss, suitable for interlinear use, and of course you can also have a proper definition and more explicatory notes but it's the gloss that's primary. It looks like your schema follows this, and Gary's proposals seem to have a similar leaning ("hazy" as definition, "marked by the presence of haze" as note). My own preference would be to make the longer definition primary and the gloss/metalanguage search key secondary; this way it's the language-internal divisions of semantic space and not the equivalences to some other language that are at the forefront.
>The schema itself supports much more than shown in the example: >multiple pronuncation schemes, definitions in addition to short >glosses, semantic domains, multiple example sentences, >cross-references (such as synonyms), notes, subentries, and senses.
These probably resolve to questions about Shoebox rather than your own designs, but: - why are etymologies cross-references? If ancestral words have their own entries at all, wouldn't they be in a completely different file? Probably 'synchronic derivation' and 'diachronic etymology' should be different fields. - is there provision for differentiating word class from, um, word subclass, from morphological information? Like "noun, masculine, /nd/-stem", or "verb, subject is patientive, third conjugation"?
>Below my signature (for easy skipping) is the 193-line schema file. (I >would have attached it and not bothered those not interested, but I >assume the listserv kills attachments.) > >Please also note that it's my first schema, and as such I may have >done things in less-than-optimal ways just to get it to validate. :P
You could've fooled me! I hadn't actually seen an XML Schema before this one. Looking through your tech page it looks like you've actually got a number of components of what Gary's planning to write. Are they very Asha'ille-specific and specific to your formatting, or could they generalise? We might have starting points for a number of aspects of the project at hand, in the latter case. Alex

Replies

Gary Shannon <fiziwig@...>
Arthaey Angosii <arthaey@...>