Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: Dictionary Programs?

From:H. S. Teoh <hsteoh@...>
Date:Wednesday, August 28, 2002, 20:45
On Tue, Aug 27, 2002 at 03:04:58PM +0200, BP Jonsson wrote:
> At 21:46 2002-08-26 -0400, H. S. Teoh wrote: > > >problem. So I wrote a little program that parses the LaTeX files and > >performs such checks as verifying dictionary order, verifying that > >cross-references exist, etc.. > > What technique do you use to check/change dictionary order? I'd really > need to be able to do that.
Well, I don't know how helpful this would be to anyone else, but basically, I needed to write a library for parsing Ebisedian orthography (it is very non-trivial). I designed it so that the parser will encode Ebisedian syllables within 24-bit integers in such a way that their numerical values reflect alphabetical order. Then it is just a simple matter of doing a lexicographic ordering on sequences of integers. (Of course, the real story is a bit more complicated, as Ebisedian alphabetical ordering has a monkey wrench: stressed syllables do not count in word ordering except when the words are otherwise identical, in which case they are sorted by the position of the first stressed syllable. This needed a little tweaking in my comparison algorithm, but otherwise, it is basically a lexicographic comparison.)
> FWIW I keep my dictionaries in ordinary database files, using calculation > fields to export formatted HTML or TeX, or just export comma separated text > files and do the formatting in a Perl program. I do vocabulary generation > and sound changes with Perl too, BTW, in spite of being no programming > addict.
[snip] Yeah, Perl is awesome. :-) Unfortunately, in the case of Ebisedian, I really needed to write it in C (mainly because flex/lex generates C lexers) because Ebisedian orthography was complex enough to require a full-strength lexical analyser. It is interesting to note that, because of the way I assign numerical values to syllables in alphabetical order, the Flex input file itself is generated by a Perl script that pre-calculates syllabic token values. There was no way I would've done it by hand, as that would be extremely error-prone and resultant errors would probably escape notice for a long time. (There are about 250 distinct syllables that I would've had to assign precise numerical values to.) For that matter, many parts of my Ebisedian tools, though written in C, have portions of them generated by Perl scripts during build time. For example, to avoid losing my sanity over having to escape backslashes in C strings containing LaTeX template commands, I wrote a Perl script that read in a template definition file, escape the dangerous characters for me, insert appropriate C-syntax, and produce something the C compiler would be happy with. Many repetitious parts of the C code, such as the vowel/consonant tables, are likewise defined in a human-readable data file, which gets digested by a Perl script which then spits out appropriate C representations. The beautiful LaTeX orthography that convinced hardliners like Jesse Bangs of Ebisedian's worth didn't come for free. ;-) A lot of effort went into producing these "infrastructure" tools that makes it possible for me to work with Ebisedian without losing my sanity/patience. :-P T -- Real Programmers use "cat > a.out".