Chinese Character Decomposition
From: | Henrik Theiling <theiling@...> |
Date: | Wednesday, June 11, 2003, 23:43 |
Hi!
Is there any way to compute a complete decomposition of Chinese
characters using e.g. the unihan data from the Unicode distribution?
I only found the standard radical + additional strokes decomposition
(kRSUnicode entry). I'm interested in the whole rest of the
character, though.
Currently, I've written a Perl-Script that does the decomposition for
one step and then sorts *all* characters that have the exact number of
strokes that remain by radical. I can then try to find the next step
decomposition manually. If a valid character happens to remain, I can
chose that and do the next step automatic again until nothing remains.
If no valid character remains, I have to do the remaining
decomposition completely by hand. Not very satisfactory...
Another, more specialised question: concerning the character gong4 (in
Mandarin) with six strokes ('to share'), radical ba1 ('eight'),
U+5171: what is the remaining decomposition after taking away ba1? No
valid character remains after removing ba1. I decomposed it as
'eight' + 'one' + 'grass' (bottom to top). Would that be correct?
(That does makes sense wrt. 'to share'.)
Needless to say, I need all this for a new conlang project that will
use Chinese characters for writing.
Bye,
Henrik
PS: Would this be considered off-topic? It has to do with languages
(though natlangs), so I omited OT in the header.
PPS: I *love* many postings! :-) I sometimes don't know what to
do if I want to read conlang ten minutes after I did and
there are no postings! :-)
Reply