Theiling Online    Sitemap    Conlang Mailing List HQ   

Chinese Character Decomposition

From:Henrik Theiling <theiling@...>
Date:Wednesday, June 11, 2003, 23:43

Is there any way to compute a complete decomposition of Chinese
characters using e.g. the unihan data from the Unicode distribution?
I only found the standard radical + additional strokes decomposition
(kRSUnicode entry).  I'm interested in the whole rest of the
character, though.

Currently, I've written a Perl-Script that does the decomposition for
one step and then sorts *all* characters that have the exact number of
strokes that remain by radical.  I can then try to find the next step
decomposition manually.  If a valid character happens to remain, I can
chose that and do the next step automatic again until nothing remains.
If no valid character remains, I have to do the remaining
decomposition completely by hand.  Not very satisfactory...

Another, more specialised question: concerning the character gong4 (in
Mandarin) with six strokes ('to share'), radical ba1 ('eight'),
U+5171: what is the remaining decomposition after taking away ba1?  No
valid character remains after removing ba1.  I decomposed it as
'eight' + 'one' + 'grass' (bottom to top).  Would that be correct?
(That does makes sense wrt. 'to share'.)

Needless to say, I need all this for a new conlang project that will
use Chinese characters for writing.


PS: Would this be considered off-topic?  It has to do with languages
    (though natlangs), so I omited OT in the header.

PPS: I *love* many postings! :-)  I sometimes don't know what to
     do if I want to read conlang ten minutes after I did and
     there are no postings! :-)


Amanda Babcock <langs@...>