Re: Dictionary Programs?
From: | H. S. Teoh <hsteoh@...> |
Date: | Wednesday, August 28, 2002, 20:45 |
On Tue, Aug 27, 2002 at 03:04:58PM +0200, BP Jonsson wrote:
> At 21:46 2002-08-26 -0400, H. S. Teoh wrote:
>
> >problem. So I wrote a little program that parses the LaTeX files and
> >performs such checks as verifying dictionary order, verifying that
> >cross-references exist, etc..
>
> What technique do you use to check/change dictionary order? I'd really
> need to be able to do that.
Well, I don't know how helpful this would be to anyone else, but
basically, I needed to write a library for parsing Ebisedian orthography
(it is very non-trivial). I designed it so that the parser will encode
Ebisedian syllables within 24-bit integers in such a way that their
numerical values reflect alphabetical order. Then it is just a simple
matter of doing a lexicographic ordering on sequences of integers.
(Of course, the real story is a bit more complicated, as Ebisedian
alphabetical ordering has a monkey wrench: stressed syllables do not count
in word ordering except when the words are otherwise identical, in which
case they are sorted by the position of the first stressed syllable. This
needed a little tweaking in my comparison algorithm, but otherwise, it is
basically a lexicographic comparison.)
> FWIW I keep my dictionaries in ordinary database files, using calculation
> fields to export formatted HTML or TeX, or just export comma separated text
> files and do the formatting in a Perl program. I do vocabulary generation
> and sound changes with Perl too, BTW, in spite of being no programming
> addict.
[snip]
Yeah, Perl is awesome. :-) Unfortunately, in the case of Ebisedian, I
really needed to write it in C (mainly because flex/lex generates C
lexers) because Ebisedian orthography was complex enough to require a
full-strength lexical analyser.
It is interesting to note that, because of the way I assign numerical
values to syllables in alphabetical order, the Flex input file itself is
generated by a Perl script that pre-calculates syllabic token values.
There was no way I would've done it by hand, as that would be extremely
error-prone and resultant errors would probably escape notice for a long
time. (There are about 250 distinct syllables that I would've had to
assign precise numerical values to.)
For that matter, many parts of my Ebisedian tools, though written in C,
have portions of them generated by Perl scripts during build time. For
example, to avoid losing my sanity over having to escape backslashes in C
strings containing LaTeX template commands, I wrote a Perl script that
read in a template definition file, escape the dangerous characters for
me, insert appropriate C-syntax, and produce something the C compiler
would be happy with. Many repetitious parts of the C code, such as the
vowel/consonant tables, are likewise defined in a human-readable data
file, which gets digested by a Perl script which then spits out
appropriate C representations.
The beautiful LaTeX orthography that convinced hardliners like Jesse Bangs
of Ebisedian's worth didn't come for free. ;-) A lot of effort went into
producing these "infrastructure" tools that makes it possible for me to
work with Ebisedian without losing my sanity/patience. :-P
T
--
Real Programmers use "cat > a.out".