Re: General Purpose Dictionary Generator
From: | Alex Fink <a4pq1injbok_0@...> |
Date: | Thursday, October 26, 2006, 7:12 |
This is a great idea. I'd actually been recently thinking that I should
convert my conlangs' lexica to a more structured format (they're currently
in un-marked-up and inconsistently-formatted human-readable text files) so I
could process them by computer; this would be perfect for that.
On Wed, 25 Oct 2006 19:21:35 -0700, Gary Shannon <fiziwig@...> wrote:
>--- Arthaey Angosii <arthaey@...> wrote:
>
>> Another question came to mind: how would you implement multiple senses
>> of a word? Or subentries? Presumably these would be in rows following
>> the main entry, but how would the dictionary program know they were
>> related?
>>
>>
>> --
>> AA
>>
http://conlang.arthaey.com
>>
>
>It might be better to have a database ecitor as part of the package that would
>allow the flexibility of storing the database as XML instead of as spreadsheet
>data.
[...]
>But that starts getting kind of messy. It would be better to have a database
>editor in outline form where the user could put whatever fields he wanted
>beneath the word in outline form. Something like this:
>
> Eng: hazy
> pos: adj
> alt: -ier
> alt: -iest
> def(1): Marked by the presence of haze.
> def(2): Not clearly defined; unclear or vague.
> form(adv): haz'ily
> form(n): haz'iness
I think nesting that's somewhat more general than parametrized field names
would be useful (with the underlying database in XML that shouldn't be too
hard).
Consider, for instance, the Armenian noun |akn|. It has three senses, 'eye,
source, gem', and each sense has a different plural, respectively |ac^hkh|,
|akunkh|, |akankh| (in ad hoc transcription). How would you represent that?
It probably makes sense to store subentries in the XML file as having the
same structure as any other entry, but nested as subsidiaries to the main
entry.
>Aside from the tag <list> this amounts to HTML markup of a pattern where words
>beginning with "$" are to be replaced by actual values from the database.
><list> simply keeps chruning out copies of the template it contains until it
>uses up all the database fields of that name in the current entry. This way a
>word might have 1 or 50 definitions, but the template would be the same.
I like these list tags. For more flexibility you could allow arbitrary
delimiters; then instead of <clist> you could have <list delim=", ">. (It
might be worthwhile retaining <clist> as a shortcut for this, though.)
>This is still just the early design phase, so nothing is carved in stone, and
>I'm open to suggestions and improvements. Most importantly, I want it to be
>fairly easy to use while still being flexible enough to be able to format any
>kind of dictionary in any format imaginable.
How about (La)TeX as an output format?
A way to specify cross-references and relationships between entries would be
nice. At the very least, you should be able to get a link from one entry to
another in the HTML output by specifying a cross-reference in one of the
entry fields.
Oh, how about supporting general sort orders on text? I like to keep my
conlang lexica sorted by the appropriate order of the conscript in
romanization (if I've designed a conscript, that is), or at the very least
in a collation order appropriate to the romanization (if |ng| is a digraph
representing one phone it really annoys me to have |ng| words appear between
|ne| and |ni|). I see you've addressed this somewhat with your remarks
about the LOTEP ordering in Piktok, and in cases where the sort order is
unpredictable from the entry (like if the entries are image names) this
might be the best solution; but if the entries are text it would be annoying
to provide a separate order field.
Alex
Reply