Re: Inserting accent marks
From: | Lars Henrik Mathiesen <thorinn@...> |
Date: | Tuesday, January 8, 2002, 18:11 |
> Date: Tue, 8 Jan 2002 21:11:11 +1100
> From: Tristan Alexander McLeay <anstouh@...>
> On Mon, 7 Jan 2002, David Starner wrote:
> > [Quoting someone else:]
> > >Many accented chars. are available in ASCII (nos. up to 255),
> > >though not all of them transmit to everyone, e.g. Alt+0154 š
> > >(s-hacek).
> > That's because they aren't in ASCII - ASCII is a 7-bit code,
> > 0-177, including characters only for English. Latin 1 added
> > accented characters for western Europe, and Microsoft added stuff
> > like š and the Euro into the space Latin-1 used for control
> > characters that Windows doesn't need.
> MS didn't add any char at 353. MS added a few chars into the upper
> control characters section of Latin 1 (ISO8859-1) creating
> WinLatin-1. I get the s hacek as control U grave. (That is ^Ugrave,
> except when I move the cursor across it, it skips the U.)
OK, enough guesswork --- please cut out the following and save it for
when this discussion breaks out again in about a month...
This is how the list of Alt-0 codes in Windows and numeric entities
like š in HTML breaks down:
Alt-0000 - Alt-0127 Normal ASCII -- no need to use special codes.
Alt-0128 The Euro sign, added to Windows Latin-1 by
Microsoft a few years ago, without any version
indication. (See also next entry).
Alt-0129 - Alt-0159 Other characters added to the original Windows
Latin-1 superset (codepage 1252). These will
normally only display on Windows systems ---
especially since Windows applications tend to lie
and claim that text containing them is real
Latin-1, or even plain ASCII. (If labelled
correctly, other systems do have a chance of
converting them to to Unicode or something else
they can display). These numeric values make no
sense in HTML --- don't use them.
Alt-0160 - Alt-0255 Latin-1 proper. Some systems, like Mac or DOS, or
  - ÿ older versions of Windows for Asia or Eastern
Europe, may still be using other codepages, but
will often be able to convert. (These codes
always have their Latin-1 values in HTML, while
Alt codes may actually give you something else,
depending on your current input locale).
Ā and up These are Unicode codepoints in HTML, and cannot
be represented directly with Alt codes. (Typing
Alt-0256 gets you right back to Alt-0000).
(Omitting the 0 from the Alt codes gets you characters from what's
called the OEM code page, which is set separately from the input
locale, and is usually not Latin-1. This is so people with fingers
trained on DOS code pages can keep using the old codes in Windows).
Anyway, Unicode and HTML do support all the characters from the
128-159 range of codepage 1252, like this:
Alt-0128 = € = U-20AC EURO SIGN
Alt-0130 = ‚ = U-201A SINGLE LOW-9 QUOTATION MARK
Alt-0131 = ƒ = U-0192 LATIN SMALL LETTER F WITH HOOK
Alt-0132 = „ = U-201E DOUBLE LOW-9 QUOTATION MARK
Alt-0133 = … = U-2026 HORIZONTAL ELLIPSIS
Alt-0134 = † = U-2020 DAGGER
Alt-0135 = ‡ = U-2021 DOUBLE DAGGER
Alt-0136 = ˆ = U-02C6 MODIFIER LETTER CIRCUMFLEX ACCENT
Alt-0137 = ‰ = U-2030 PER MILLE SIGN
Alt-0138 = Š = U-0160 LATIN CAPITAL LETTER S WITH CARON
Alt-0139 = ‹ = U-2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK
Alt-0140 = Œ = U-0152 LATIN CAPITAL LIGATURE OE
Alt-0142 = Ž = U-017D LATIN CAPITAL LETTER Z WITH CARON
Alt-0145 = ‘ = U-2018 LEFT SINGLE QUOTATION MARK
Alt-0146 = ’ = U-2019 RIGHT SINGLE QUOTATION MARK
Alt-0147 = “ = U-201C LEFT DOUBLE QUOTATION MARK
Alt-0148 = ” = U-201D RIGHT DOUBLE QUOTATION MARK
Alt-0149 = • = U-2022 BULLET
Alt-0150 = – = U-2013 EN DASH
Alt-0151 = — = U-2014 EM DASH
Alt-0152 = ˜ = U-02DC SMALL TILDE
Alt-0153 = ™ = U-2122 TRADE MARK SIGN
Alt-0154 = š = U-0161 LATIN SMALL LETTER S WITH CARON
Alt-0155 = › = U-203A SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
Alt-0156 = œ = U-0153 LATIN SMALL LIGATURE OE
Alt-0158 = ž = U-017E LATIN SMALL LETTER Z WITH CARON
Alt-0159 = Ÿ = U-0178 LATIN CAPITAL LETTER Y WITH DIAERESIS
For completeness, there are also other Latin-x codes defined --- among
these Latin-9 (8859-15), which has the Euro sign and the same seven
letters that CP1252 added, instead of eight of the punctuation signs
of Latin-1. So for conlanging use, it would be just as good as CP1252
--- except that noone seems to be making systems that use it. Unicode
has stolen its march.
Lars Mathiesen (U of Copenhagen CS Dep) <thorinn@...> (Humour NOT marked)
Replies