Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: General Purpose Dictionary Generator

From:Gary Shannon <fiziwig@...>
Date:Thursday, October 26, 2006, 2:21
--- Arthaey Angosii <arthaey@...> wrote:

> Another question came to mind: how would you implement multiple senses > of a word? Or subentries? Presumably these would be in rows following > the main entry, but how would the dictionary program know they were > related? > > > -- > AA > >
It might be better to have a database ecitor as part of the package that would allow the flexibility of storing the database as XML instead of as spreadsheet data. Consider this English dictionary entry: haz•y » adj. -i•er, -i•est 1. Marked by the presence of haze. 2. Not clearly defined; unclear or vague. --haz'i•ly adv. --haz'i•ness n. Suppose we want the main entry and the alternate endings and alternate forms to be in sans-serif bold type, the parts of speech in Times Roman italic, and the definition in Times Roman. We could format the spreadsheet data like this: row, word, pos, alt, define 1 "haz•y", "adj.", "-i•er", , 2 , , "-i•est", , 3 , , , "Marked by the presence of haze." 4 , , , "Not clearly defined; unclear or vague." 5.... etc. But that starts getting kind of messy. It would be better to have a database editor in outline form where the user could put whatever fields he wanted beneath the word in outline form. Something like this: Eng: haz•y pos: adj alt: -i•er alt: -i•est def(1): Marked by the presence of haze. def(2): Not clearly defined; unclear or vague. form(adv): haz'i•ly form(n): haz'i•ness Internally it would be stored as XML, but to the user, via the editor, it would just be outline form with whatever fields the user felt like defining. Within the template databse fields would be identified with "$", and a few special purpose field name like $_VAL (described below) would be defined. Now we could format the entry the way we wanted it like this: <b>$Eng »</b><i>$pos</i> <clist>)><b>$alt</b></clist> <list><b>$_VAL. $def</b></list> <list><b>--$form</b> <i>$_VAL</i></list> This can be broken down as follows: <b>$Eng »</b> displays the database field "Eng" (the English word) in BOLD followed by » <i>$pos</i> displays the database field "pos" (part of speech) in italic <clist><b>$alt</b></clist> comma separated list of all the database fields named "alt", in BOLD <list><b>$_VAL. $def</b></list> list of all the database fields named "def", preceeded by the value inside the parens in the database (i.e. the definition number $_VAL) <list><b>--$form</b> <i>$_VAL</i></list> list of all the database fields named "form" followed by the value inside the parens in the database (in this case, the part of speech) Aside from the tag <list> this amounts to HTML markup of a pattern where words beginning with "$" are to be replaced by actual values from the database. <list> simply keeps chruning out copies of the template it contains until it uses up all the database fields of that name in the current entry. This way a word might have 1 or 50 definitions, but the template would be the same. This is still just the early design phase, so nothing is carved in stone, and I'm open to suggestions and improvements. Most importantly, I want it to be fairly easy to use while still being flexible enough to be able to format any kind of dictionary in any format imaginable. I would like to work out some mock-up databases for single-language and multi-languge dictionaries of different types to be sure the design will acccomodate all these cases. I will also publish the JAVA source code so anyone who wants to can modify the code to suit their own purposes. --gary


Gary Shannon <fiziwig@...>