Theiling Online    Sitemap    Conlang Mailing List HQ   

General Purpose Dictionary Generator

From:Gary Shannon <fiziwig@...>
Date:Monday, October 23, 2006, 22:00
I've written a couple of dictionary generator programs for a few of my earlier
conlangs, and one for Larry Sulky's Elomi. The problem is every language
requires something a little different in the way of formatting a dictionary,
and everyone who wants to create a bilingual dictionary has a different idea of
how they would like to format their dictionary. What I have in mind to build is
a dictionary generator that is completely customizable and configurable to
handle any type of language and any kind of dictionary layout and formatting.

Before I start writing the code, however, I'd like to hear what people would
like to have in the way of features and capabilities.

The generator program would be written in JAVA so it will run on any platform
under Windows, Linux, MAC, etc.

Input comes from a spreadsheet file in CSV (Comma Separated Values) format.
Output is in the form of formatted HTML pages ready for printing or uploading
to a web site.

The column headings describe the type of data in each column, and are used in
the format patterns for formatting the HTML dictionary entries. All column
headings are optional, and all are invented by the user to suit the needs of
his specific language. For example, an English-Latin dictionary spreadsheet
might include a column with the heading "pr-pts" for the principle parts of the
Latin verb. Such a column would not exist in an English-Japanese dictionary. A
language like Swahili might have a column to name which noun class a noun
belongs to.

Format patterns show how and where to display the columns from the spreadhseet

Format patterns may include HTML markup tags such as bold (<b>...</b>), italic
(<i>...</i>), large font (<big>...</big>), subscripts (<sub>...</sub>),
superscripts (<sup>...</sup>), and pseudo-HTML tags for custom typographic
styles (<type="example">...</type>) etc. A users manual would give complete
details on how to set up and use typographic styles. (These will be translated
into cascading stylesheets and span class tags in the output HTML)

Number of dictionary entries, or number of printed lines per HTML page can be
set by the user.

An HTML template file can be used to supply the header, footer, title, style
sheets, etc. of the dictionary pages. This allows the user to design the
overall look and feel of the dictionary pages, including any graphics or logos
he or she might want to include on the dictionary pages. A couple of sample
template files would be included to get you started.

Two separate sets of HTML pages are generated, one for language A to Language B
and one for the other way around.

Format Patterns

Given a spreadsheet with these named columns:

"latin" is the spreadsheet column header of the column containing the Latin
"pr-pts" is the spreadsheet column header of the column containing the
principle parts of Latin verbs.
"pos" is the spreadsheet column header for the parts of speech column.
"english" is the spreadsheet column header for the English word.
"notes" is the spreadsheet column header for the usage notes column.

The formatting patterns might look something like this:

Items starting with a the dollar sign "$" are replaced by the data from the
spreadsheet column with that name. For example, "<b>($pos)</b>" in the format
pattern will display data from the spreadsheet column called "pos" and will
display it in bold type, and enclose it in parentheses.

Sample Latin to English entry format might be pattern:

	<b>$latin</b> </i>$pr-pts</i> - [$pos] $english ($notes)

This would show the Latin word in bold type, the principle parts (if any) in
italics, a hyphen, the part of speech enclosed in square brackets, the English
word or words and any usage notes enclosed in parens.

Sample English to Latin format pattern:

	<b>$english</b> - [$pos] $latin ($notes)

This would show the English word in bold type, a hyphen, the part of speech
enclosed in square brackets, the Latin word or words and any usage notes
enclosed in parens.

In this example the user defines the format of the Latin to English entries and
the English to Latin entries as shown. If any column is empty then that data
element will be left out of the formated line. For example, if the "notes"
column contains the comment "When addressing a superior." then that comment
will be included at the end of the entry, and be enclosed in parentheses as
shown in the format. If that column is empty then neither the notes, nor the
surrounding parentheses will be shown.


Arthaey Angosii <arthaey@...>
Eugene Oh <un.doing@...>
James W. <emindahken@...>
Arthaey Angosii <arthaey@...>