On 30 Oct, 16:38, Jean-Guy Mouton <u....RemoveThis@example.net> wrote:
> I have a website with accented characters. Do I have to convert them
> into html entities in XHTML 1.0 strict and charset=iso-8859-1?
If you do things correctly, then they'll work equally well in any of
three ways (even mixed on the same page).
* Directly entered characters "é"
* HTML entity references é
* numeric character entities é
Just make sure that the web server sends a _matching_ encoding for how
the document was itself encoded. It doesn't matter which encoding you
author in (of encodings that contain the characters you need), so long
as you match it with the HTTP content-type header.
Ignore <meta> inside the page. It's of no use on the web and is often
misleading.
If you can't reliably control the HTTP content-type header, then use
either form of the entities.
If you can have the HTTP content-type header set once, but only once,
then set it to UTF-8 (this is quite common in a corporate
environment).
Some (surprisingly little-known) things that you ought to understand:
- Unicode is a character set, UTF-8 is an encoding to represent this
as a sequence of data. The two are separate functions.
- That Unicode character set is used throughout HTML, whether you
like it or not. When you use numeric character entities, even from an
ISO-8859-* page, the numbers you use refer to Unicode, not to ISO.
I would suggest avoiding ISO-8859-* in favour of UTF-8. Some of your
tools will no longer work, but there are plenty that will replace
them, and for free. These days a tool that isn't UTF-8 clean has
little place in a web design shop. The great advantage of UTF-8 is
obviously when you have to support multiple languages - it's near-
essential for doing this on the same page, but it's even worth doing
if you only have to support different language clients from the same
office.
Watch out for UTF-16 from some Windows tools! That "Save as Unicode"
option is often the wrong thing - look further down for UTF-8.
Don't use a BOM (aka UTF-8Y) as that's incompatible with ASCII (and
most ISO-8859-* characters) encodings.
If your authoring process is only ASCII-clean and you only need
Western European characters, then the character entity references
(e.g. é rather than for é for "é") are simple and robust
against mistakes.
If you need characters from outside Western Europpe, then you can't
use character entity references (for any encoding). If you use
ISO-8859-1 encoding then you MUST use numeric character entities. If
you use UTF-8 then you can use either characters entered directly, or
numeric character entities. As the numerics are hard to proof-read,
this alone is enough reason to favour UTF-8
I'd also suggest dropping XHTML in favour of HTML 4.01 Strict, but
that's for HTML reasons, not character encoding.
>> Stay informed about: Converting accented characters to entities