Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: Inserting accent marks

From:Lars Henrik Mathiesen <thorinn@...>
Date:Tuesday, January 8, 2002, 18:11
> Date: Tue, 8 Jan 2002 21:11:11 +1100 > From: Tristan Alexander McLeay <anstouh@...>
> On Mon, 7 Jan 2002, David Starner wrote: > > [Quoting someone else:] > > >Many accented chars. are available in ASCII (nos. up to 255), > > >though not all of them transmit to everyone, e.g. Alt+0154 &#353; > > >(s-hacek). > > That's because they aren't in ASCII - ASCII is a 7-bit code, > > 0-177, including characters only for English. Latin 1 added > > accented characters for western Europe, and Microsoft added stuff > > like &#353; and the Euro into the space Latin-1 used for control > > characters that Windows doesn't need. > MS didn't add any char at 353. MS added a few chars into the upper > control characters section of Latin 1 (ISO8859-1) creating > WinLatin-1. I get the s hacek as control U grave. (That is ^Ugrave, > except when I move the cursor across it, it skips the U.)
OK, enough guesswork --- please cut out the following and save it for when this discussion breaks out again in about a month... This is how the list of Alt-0 codes in Windows and numeric entities like &#353; in HTML breaks down: Alt-0000 - Alt-0127 Normal ASCII -- no need to use special codes. Alt-0128 The Euro sign, added to Windows Latin-1 by Microsoft a few years ago, without any version indication. (See also next entry). Alt-0129 - Alt-0159 Other characters added to the original Windows Latin-1 superset (codepage 1252). These will normally only display on Windows systems --- especially since Windows applications tend to lie and claim that text containing them is real Latin-1, or even plain ASCII. (If labelled correctly, other systems do have a chance of converting them to to Unicode or something else they can display). These numeric values make no sense in HTML --- don't use them. Alt-0160 - Alt-0255 Latin-1 proper. Some systems, like Mac or DOS, or &#160; - &#255; older versions of Windows for Asia or Eastern Europe, may still be using other codepages, but will often be able to convert. (These codes always have their Latin-1 values in HTML, while Alt codes may actually give you something else, depending on your current input locale). &#256 and up These are Unicode codepoints in HTML, and cannot be represented directly with Alt codes. (Typing Alt-0256 gets you right back to Alt-0000). (Omitting the 0 from the Alt codes gets you characters from what's called the OEM code page, which is set separately from the input locale, and is usually not Latin-1. This is so people with fingers trained on DOS code pages can keep using the old codes in Windows). Anyway, Unicode and HTML do support all the characters from the 128-159 range of codepage 1252, like this: Alt-0128 = &#8364; = U-20AC EURO SIGN Alt-0130 = &#8218; = U-201A SINGLE LOW-9 QUOTATION MARK Alt-0131 = &#402; = U-0192 LATIN SMALL LETTER F WITH HOOK Alt-0132 = &#8222; = U-201E DOUBLE LOW-9 QUOTATION MARK Alt-0133 = &#8230; = U-2026 HORIZONTAL ELLIPSIS Alt-0134 = &#8224; = U-2020 DAGGER Alt-0135 = &#8225; = U-2021 DOUBLE DAGGER Alt-0136 = &#710; = U-02C6 MODIFIER LETTER CIRCUMFLEX ACCENT Alt-0137 = &#8240; = U-2030 PER MILLE SIGN Alt-0138 = &#352; = U-0160 LATIN CAPITAL LETTER S WITH CARON Alt-0139 = &#8249; = U-2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK Alt-0140 = &#338; = U-0152 LATIN CAPITAL LIGATURE OE Alt-0142 = &#381; = U-017D LATIN CAPITAL LETTER Z WITH CARON Alt-0145 = &#8216; = U-2018 LEFT SINGLE QUOTATION MARK Alt-0146 = &#8217; = U-2019 RIGHT SINGLE QUOTATION MARK Alt-0147 = &#8220; = U-201C LEFT DOUBLE QUOTATION MARK Alt-0148 = &#8221; = U-201D RIGHT DOUBLE QUOTATION MARK Alt-0149 = &#8226; = U-2022 BULLET Alt-0150 = &#8211; = U-2013 EN DASH Alt-0151 = &#8212; = U-2014 EM DASH Alt-0152 = &#732; = U-02DC SMALL TILDE Alt-0153 = &#8482; = U-2122 TRADE MARK SIGN Alt-0154 = &#353; = U-0161 LATIN SMALL LETTER S WITH CARON Alt-0155 = &#8250; = U-203A SINGLE RIGHT-POINTING ANGLE QUOTATION MARK Alt-0156 = &#339; = U-0153 LATIN SMALL LIGATURE OE Alt-0158 = &#382; = U-017E LATIN SMALL LETTER Z WITH CARON Alt-0159 = &#376; = U-0178 LATIN CAPITAL LETTER Y WITH DIAERESIS For completeness, there are also other Latin-x codes defined --- among these Latin-9 (8859-15), which has the Euro sign and the same seven letters that CP1252 added, instead of eight of the punctuation signs of Latin-1. So for conlanging use, it would be just as good as CP1252 --- except that noone seems to be making systems that use it. Unicode has stolen its march. Lars Mathiesen (U of Copenhagen CS Dep) <thorinn@...> (Humour NOT marked)

Replies

Philip Newton <philip.newton@...>
John Cowan <jcowan@...>