Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: OT: Question: Unicode

From:Carlos Thompson <chlewey@...>
Date:Sunday, May 18, 2003, 6:42
Roger Mills wrote:


> I've created a web page using MS Word, and Lucida Sans Unicode. In the > header, MS says "charset-MS 1252" or somesuch. Should this be changed to > UTF8?
Well, you should say UTF-8 if the text file is in UTF format, that is, if you will give entities above ASCII with variable length codes (those that look like ë for an á). You should use MS 1252, or better: ISO-8859-1, if you plan to use Latin-1 codes (as in this e-mail) and html numeric entities (those codes that look like "&#8221;") for Unicode values over 255. You might use either or plain ASCII if you want to give html numeric entities for Simply, write an a acute (á). open your file with notepad or any other text editor of that kind. Look how that a-acute looks in the text editor: if it is an a acute, then the file is in Latin-1: using ISO-8859-1 or MS 1252. if it is an A tilde followed by somethig else, the file is using UTF-8. if it says &aacute; or &#225; or &#xE1; then your editor is making a plain ASCII file. Now, if your HTML editor can convert between formats, then: ASCII files with HTML numeric entities are more portable and less browser dependant. Anyhow, most probably those browsers that will accept an HTML numeric entity for an IPA extention in Lucida Sans Unicode and show it correctly, will support a different encoding. Latin-1 (ISO-8859-1) is the ideal is you are written in Euroepan Western languages mostly. You can still use HTML numeric entities for non Latin-1 character. Microsoft Latin-1 (MS codepage 1252)... well, include some nice characters in the 128-159 unused section of ISO Latin-1, like opening and closing quotation marks and the Euro sign, but... they are still available in Unicode above 255. UTF-8: makes shorter files if you are using lots of codes not available in Latin-1 or any other ISO-8859 code page. The UTF-8 files are difficult to edit in common text editors (vi, pico, notepad, wordpad, etc) but if you will never touch the HTML file with a text editor, you should not worried. -- Carlos Th

Replies

Roger Mills <romilly@...>
Roger Mills <romilly@...>
Herman Miller <hmiller@...>