Re: TECH: dumb html question
From: | Mark J. Reed <markjreed@...> |
Date: | Wednesday, January 14, 2004, 14:03 |
On Tue, Jan 13, 2004 at 03:28:03PM +0100, Benct Philip Jonsson wrote:
> At 14:34 12.1.2004, Mark J. Reed wrote:
> >RM> Does "&schwa;" exist?
> >
> >Nope. Although one of the nice things about XHTML is that XML lets you
> >define your own entities, so that you can add it if you like.
>
> How is that done? Can it be done in a stylesheet?
The later replies on this thread by me and Tristan had an example,
but here's what is required:
1. A browser that will apply HTML-type rendering to an XHTML document.
It appears that Internet Explorer doesn't do this. If it
recognizes a document as XHTML and parses it as an XML document,
all you get is pretty-printed source, rather than a visual rendering.
Or, if it parses it as an HTML document, then it is rendered, but
XML-specific tricks like custom entities don't work.
If anyone knows a way to convince IE to both parse as XML and
render as HTML, please let me know.
Mozilla/Firebird/Gecko works fine; I haven't tried Opera or
any other browsers.
2. The document must be recognized by the browser as XHTML, not just
HTML.
That means the Content-Type sent by the web server has to
be "application/xhtml+xml", not "text/html". A modern web
server will do the right thing if the file is named with
a .xhtml suffix; any browser that meets condition 1 will
probably also do the right thing if you open a local file
with such a suffix.
Note that if the web server is not configured properly, you can't
fake it with a <meta http-equiv> element, because by the time that
element is processed the browser has already decided whether it's HTML
or XHTML.
3. The document has to be legal X[HT]ML.
Among other things, this means that the very first thing in it - no
leading whitespace, even -has to be the XML processing directive:
<?xml version="1.0"?>
An XML document is assumed to be UTF-8-encoded Unicode by
default; if you're using another character set, such as Latin-1,
you must say so in the XML directive, like so:
<?xml version="1.0" encoding="iso-8859-1"?>
After that you need a <!DOCTYPE> directive (See below), and then
you can finally get things rolling with the opening tag of the
<html> element - which needs some extra attributes.
The rest of the document has to be valid
XHTML: lowercase element names, all empty tags explicitly
marked, all attributes with double-quoted values, etc.
4. Any custom entities are defined in the DOCTYPE directive.
XML defines an entire (infinitely large) family of markup languages; the
<!DOCTYPE> directive tells the parser which particular language is in
use for the document containing it, by pointing to a formal description of
that language (called a Document Type Definition, or DTD). For describing
a web page, the particular language is XHTML, but even then, there are
several dialects to choose from. If your web page is using any of the
older presentation-type markup (<body bgcolor=>, <b>, <i>, etc; basically
anything that is supposed to be done with stylesheets these days), you need
to label it as XHTML 1.0 Transitional:
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
Otherwise, you should label it as XHTML 1.1:
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.1//EN" "xhtml11.dtd">
You also need to specify the default namespace and language
in the <html> tag:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
The first time you're trying to get all this working, you should
probably make sure the document validates as-is before trying to
add the custom entities. You can check it at http://validator.w3.org.
Custom entities are defined by adding <!ENTITY> declarations inside the
<!DOCTYPE> declaration. For instance, if you want entities for the
IPA vowel symbols:
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.1//EN" "xhtml11.dtd"
[
<!ENTITY alpha "ɑ">
<!ENTITY talpha "ɒ">
<!ENTITY openo "ɔ">
<!ENTITY reve "ɘ">
<!ENTITY schwa "ə">
<!ENTITY eps "ɛ">
<!ENTITY reveps "ɜ">
.
.
.
<!ENTITY ups "ʊ">
.
.
.
]
>
Then you include those entities in the text just like the built-in ones:
<p>Phonemically, the word <about> is /&schwa;'bα&ups;t/</p>
-Mark