Re: OT: CXS chart and machine-readable Unicode->CXS mappings
From: | Henrik Theiling <theiling@...> |
Date: | Tuesday, March 9, 2004, 17:48 |
Hi!
Mark wrote:
> I've wrapped a module around the Perl so that you can simply do
>
> use CXS;
>...
Oh, nice! I incorporated the code into my conversion script and renamed
the resulting file to CXS.pm . By this, the page will always contain the
newest bug-fixed/otherwise updated version as a Perl module.
I also made the C-code more usable by wrapping a 'module' around it
(or what C thinks a module is).
Your skript has one problem, though, but that is due to the data
actually: the hash table cannot be easily reversed because some
Unicodes are mapped to the same CXS. This is mainly due to my
inclusion of the modifier letters *and* the combining version of the
accents. The combining versions should be preferred, unless the skipt
sees a diacritic without something to attach to, in which case the
isolated form should be returned. This is tricky. An easier way
would be to ignore the modifier letters if there is a combining
version.
I will try to fix that by providing the module with more information
about which entries should be considered primary.
Bye,
Henrik