Re: General Purpose Dictionary Generator
|From:||Gary Shannon <fiziwig@...>|
|Date:||Thursday, October 26, 2006, 2:21|
--- Arthaey Angosii <arthaey@...> wrote:
It might be better to have a database ecitor as part of the package that would
allow the flexibility of storing the database as XML instead of as spreadsheet
Consider this English dictionary entry:
hazy » adj. -ier, -iest 1. Marked by the presence of haze. 2. Not clearly
defined; unclear or vague. --haz'ily adv. --haz'iness n.
Suppose we want the main entry and the alternate endings and alternate forms to
be in sans-serif bold type, the parts of speech in Times Roman italic, and the
definition in Times Roman. We could format the spreadsheet data like this:
row, word, pos, alt, define
1 "hazy", "adj.", "-ier", ,
2 , , "-iest", ,
3 , , , "Marked by the presence of haze."
4 , , , "Not clearly defined; unclear or vague."
But that starts getting kind of messy. It would be better to have a database
editor in outline form where the user could put whatever fields he wanted
beneath the word in outline form. Something like this:
def(1): Marked by the presence of haze.
def(2): Not clearly defined; unclear or vague.
Internally it would be stored as XML, but to the user, via the editor, it would
just be outline form with whatever fields the user felt like defining. Within
the template databse fields would be identified with "$", and a few special
purpose field name like $_VAL (described below) would be defined.
Now we could format the entry the way we wanted it like this:
<b>$Eng »</b><i>$pos</i> <clist>)><b>$alt</b></clist> <list><b>$_VAL.
$def</b></list> <list><b>--$form</b> <i>$_VAL</i></list>
This can be broken down as follows:
<b>$Eng »</b> displays the database field "Eng" (the English word)
in BOLD followed by »
<i>$pos</i> displays the database field "pos" (part of speech)
<clist><b>$alt</b></clist> comma separated list of all the database
fields named "alt", in BOLD
<list><b>$_VAL. $def</b></list> list of all the database fields
named "def", preceeded by the value inside the parens in
the database (i.e. the definition number $_VAL)
<list><b>--$form</b> <i>$_VAL</i></list> list of all the database
fields named "form" followed by the value inside the parens
in the database (in this case, the part of speech)
Aside from the tag <list> this amounts to HTML markup of a pattern where words
beginning with "$" are to be replaced by actual values from the database.
<list> simply keeps chruning out copies of the template it contains until it
uses up all the database fields of that name in the current entry. This way a
word might have 1 or 50 definitions, but the template would be the same.
This is still just the early design phase, so nothing is carved in stone, and
I'm open to suggestions and improvements. Most importantly, I want it to be
fairly easy to use while still being flexible enough to be able to format any
kind of dictionary in any format imaginable.
I would like to work out some mock-up databases for single-language and
multi-languge dictionaries of different types to be sure the design will
acccomodate all these cases.
I will also publish the JAVA source code so anyone who wants to can modify the
code to suit their own purposes.
> Another question came to mind: how would you implement multiple senses
> of a word? Or subentries? Presumably these would be in rows following
> the main entry, but how would the dictionary program know they were