Welcome to MobyThreads.com!
FAQFAQ   SearchSearch      ProfileProfile    Private MessagesPrivate Messages   Log in/Register/PasswordLog in/Register/Password

Converting accented characters to entities

 
   Web Hosting and Web Master Forums (Home) -> Webmaster RSS
Related Topics:
Accented Characters with html Form - Hello, I have a problem with accented written by visitors of my web page on a contact form. These are converted to weird signs and I do not why. The page with the contact form is written in French with the

Entities question - Hi Group, I have found I am unable to add ' in a line of text, I end up with a very pretty question mark. On research I have found I need to use an Entity. The correct one being But then on further research I found this does not work in IE :( ..

Looking for a Javascript - Accented Character insertion - Hi all; I'm not sure how to word this for a google search, so I thought I'd just go right to this NG. I have a site I'm about to start creating a massive CMS for but it's got both an English and a French side to the site that will need ..

Converting to CSS - I'm about to inroduce cascading style sheets to all 4000+ pages of my site. I don't know much about CSS - where can I find more about it ? -- Alfred Molon - Photos from Myanmar, Malaysia, Thailand, Laos,..

converting PSD to HTML - anyone know of a simple (as if anything could be *that* simple) way of a PSD file to HTML? or a tutorial showing how best to do it? thank you -:¦:- *Alistair * -:¦:-
Author Message
Jean-Guy Mouton

External


Since: Sep 24, 2007
Posts: 7



(Msg. 1) Posted: Tue Oct 30, 2007 5:38 pm
Post subject: Converting accented characters to entities
Archived from groups: alt>www>webmaster (more info?)

Hello,

I have a website with accented characters. Do I have to convert them
into html entities in XHTML 1.0 strict and charset=iso-8859-1?

If so, could you recommend a freeware?

Thank you.

 >> Stay informed about: Converting accented characters to entities 
Back to top
Login to vote
Ben C

External


Since: Aug 01, 2007
Posts: 9



(Msg. 2) Posted: Tue Oct 30, 2007 5:38 pm
Post subject: Re: Converting accented characters to entities [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On 2007-10-30, Jean-Guy Mouton <user.TakeThisOut@example.net> wrote:
> Hello,
>
> I have a website with accented characters. Do I have to convert them
> into html entities in XHTML 1.0 strict and charset=iso-8859-1?

No, just make sure your pages are properly saved in ISO-8859-1 and that
the server is configured to deliver the correct charset in the
Content-Type header.

That's assuming ISO-8859-1 covers all the accented characters you need--
what language is it for? If it's French then you should be fine. If it's
Vietnamese (say) then you need a different encoding, probably UTF-8.

 >> Stay informed about: Converting accented characters to entities 
Back to top
Login to vote
Andy Dingley

External


Since: Jun 05, 2007
Posts: 69



(Msg. 3) Posted: Tue Oct 30, 2007 5:38 pm
Post subject: Re: Converting accented characters to entities [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On 30 Oct, 16:38, Jean-Guy Mouton <u....RemoveThis@example.net> wrote:

> I have a website with accented characters. Do I have to convert them
> into html entities in XHTML 1.0 strict and charset=iso-8859-1?

If you do things correctly, then they'll work equally well in any of
three ways (even mixed on the same page).
* Directly entered characters "é"
* HTML entity references é
* numeric character entities é

Just make sure that the web server sends a _matching_ encoding for how
the document was itself encoded. It doesn't matter which encoding you
author in (of encodings that contain the characters you need), so long
as you match it with the HTTP content-type header.

Ignore <meta> inside the page. It's of no use on the web and is often
misleading.

If you can't reliably control the HTTP content-type header, then use
either form of the entities.

If you can have the HTTP content-type header set once, but only once,
then set it to UTF-8 (this is quite common in a corporate
environment).



Some (surprisingly little-known) things that you ought to understand:

- Unicode is a character set, UTF-8 is an encoding to represent this
as a sequence of data. The two are separate functions.

- That Unicode character set is used throughout HTML, whether you
like it or not. When you use numeric character entities, even from an
ISO-8859-* page, the numbers you use refer to Unicode, not to ISO.


I would suggest avoiding ISO-8859-* in favour of UTF-8. Some of your
tools will no longer work, but there are plenty that will replace
them, and for free. These days a tool that isn't UTF-8 clean has
little place in a web design shop. The great advantage of UTF-8 is
obviously when you have to support multiple languages - it's near-
essential for doing this on the same page, but it's even worth doing
if you only have to support different language clients from the same
office.

Watch out for UTF-16 from some Windows tools! That "Save as Unicode"
option is often the wrong thing - look further down for UTF-8.

Don't use a BOM (aka UTF-8Y) as that's incompatible with ASCII (and
most ISO-8859-* characters) encodings.

If your authoring process is only ASCII-clean and you only need
Western European characters, then the character entity references
(e.g. é rather than for é for "é") are simple and robust
against mistakes.

If you need characters from outside Western Europpe, then you can't
use character entity references (for any encoding). If you use
ISO-8859-1 encoding then you MUST use numeric character entities. If
you use UTF-8 then you can use either characters entered directly, or
numeric character entities. As the numerics are hard to proof-read,
this alone is enough reason to favour UTF-8


I'd also suggest dropping XHTML in favour of HTML 4.01 Strict, but
that's for HTML reasons, not character encoding.
 >> Stay informed about: Converting accented characters to entities 
Back to top
Login to vote
Jean-Guy Mouton

External


Since: Sep 24, 2007
Posts: 7



(Msg. 4) Posted: Tue Oct 30, 2007 5:59 pm
Post subject: Re: Converting accented characters to entities [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Ben C wrote:
> No, just make sure your pages are properly saved in ISO-8859-1 and that
> the server is configured to deliver the correct charset in the
> Content-Type header.
How to check about the hosting server please?
>
> That's assuming ISO-8859-1 covers all the accented characters you need--
> what language is it for? If it's French then you should be fine. If it's
Yes that's French.
 >> Stay informed about: Converting accented characters to entities 
Back to top
Login to vote
1001 Webs

External


Since: Oct 31, 2007
Posts: 11



(Msg. 5) Posted: Tue Oct 30, 2007 7:46 pm
Post subject: Re: Converting accented characters to entities [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Oct 30, 5:38 pm, Jean-Guy Mouton <u... DeleteThis @example.net> wrote:
> Hello,
>
> I have a website with accented characters. Do I have to convert them
> into html entities in XHTML 1.0 strict and charset=iso-8859-1?
>
> If so, could you recommend a freeware?
>
> Thank you.

Use UTF-8 whenever you can.
UTF-8 is able to represent any character in the Unicode standard, yet
the initial encoding of byte codes and character assignments for UTF-8
is backwards compatible with ASCII.
For these reasons, it is steadily becoming the preferred encoding for
e-mail, web pages, and other places where characters are stored or
streamed.

Advantages

* UTF-8 is a superset of ASCII. Since a plain ASCII string is also
a valid UTF-8 string, no conversion needs to be done for existing
ASCII text. Software designed for traditional non-extended ASCII
character sets can generally be used with UTF-8 with few or no
changes.
* Sorting of UTF-8 strings using standard byte-oriented sorting
routines will produce the same results as sorting them based on
Unicode code points. (This has limited usefulness, though, since it is
unlikely to represent the culturally acceptable sort order of any
particular language or locale.)
* UTF-8 and UTF-16 are the standard encodings for XML documents.
All other encodings must be specified explicitly either externally or
through a text declaration. [1]
* Any byte oriented string search algorithm can be used with UTF-8
data (as long as one ensures that the inputs only consist of complete
UTF-8 characters). Care must be taken with regular expressions and
other constructs that count characters, however.
* UTF-8 strings can be fairly reliably recognized as such by a
simple algorithm. That is, the probability that a string of characters
in any other encoding appears as valid UTF-8 is low, diminishing with
increasing string length. For instance, the octet values C0, C1, F5 to
FF never appear. For better reliability, regular expressions can be
used to take into account illegal overlong and surrogate values (see
the W3 FAQ: Multilingual Forms for a Perl regular expression to
validate a UTF-8 string).

http://en.wikipedia.org/wiki/UTF-8#Advantages
 >> Stay informed about: Converting accented characters to entities 
Back to top
Login to vote
Display posts from previous:   
   Web Hosting and Web Master Forums (Home) -> Webmaster All times are: Pacific Time (US & Canada) (change)
Page 1 of 1

 
You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum



[ Contact us | Terms of Service/Privacy Policy ]