Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: TECH: Unicode Private Use Area

From:Mark J. Reed <markjreed@...>
Date:Saturday, February 2, 2008, 13:04
There's a very important difference between using other established
characters and using the UPUA.  If I see a web page that specifies in
its CSS a particular font face I don't have, but otherwise appears to
be normal Latin alphabet characters (albeit spelling words in some
language I don't speak), i would expect that those Latin characters
are the intended characters, even if they don't look exactly right in
my browser.

Unicode all about representing data, not displaying it to humans.  It
is predicated on the assumption that characters (letters, numbers,
punctuation marks, etc) have a conceptual existence independent of
their physical manifestations.  Philosophically one may argue this
point, but it is at least approximately true given the sheer number of
different ways you can display, e.g., U+0041 LATIN CAPITAL LETTER A.
A few from Douglas Hofstadter:

http://www.aare.edu.au/02pap/Image501.gif

Modern web design is predicated on this same distinction, with the
separation of font choice (via CSS) from the specification of the text
to display in that font (via HTML).

So if a file claims to contain Unicode text, then anywhere the scalar
value U+0041 (hexadecimal 41, decimal 65) appears in that file, it
must *always* mean "uppercase A in the Latin alphabet".   Fonts don't
enter into the equivalence.

The point of Unicode was to create room for all the characters out
there to have their very own textual representation.  Not just as
image or glyphs in some custom font, but assigned places in a regular
old stream of text with no markup or style information associated with
it at all.

They recognized that there are probably too many characters to achieve
this goal, which is why there's the UPUA.  Two people can agree to use
portions of it for characters otherwise not in Unicode, and even
someone who doesn't know how they're using it can at least tell that
(1) it's text, (2) the characters aren't elsewhere in Unicode.
Whatever that thing is, it's not an A.  Which is a big improvement
over the "well, yeah, it's an A, but not if you use the right font!"

The CSUR is just a place that makes it easy for one set of people
(conlangers) to get together and agree on one way to use the UPUA with
each other within their group.  It in no way keeps other groups of
people (or differently-minded conlangers) from using it differently.
But its use facilitates the use of conlang text on places like
FrathWiki that would otherwise have to house large image collections
to display it all.