Stories
Slash Boxes
Comments

Dev.SN ♥ developers

posted by martyb on Thursday July 24 2014, @02:16PM   Printer-friendly

Summary:

Tests all HTML Special named character entities defined in HTML 4. Each test point contains:

  1. The named character entity's description taken from the HTML 4 specification.
  2. The entity displayed as a UTF-8 encoded character.
  3. The entity displayed as a Named Character Entity.
  4. The entity displayed as a Decimal Character Entity.
  5. The entity displayed as a Hexadecimal Character Entity.

Documents Referenced:

http://www.w3.org/TR/html4/sgml/entities.html
"Character entity references in HTML 4"
https://tools.ietf.org/html/rfc3629
"UTF-8, a transformation format of ISO 10646"

Tests:

<!-- Special characters for HTML -->

<!-- Character entity set. Typical invocation:
<!ENTITY % HTMLspecial PUBLIC
"-//W3C//ENTITIES Special//EN//HTML">
%HTMLspecial; -->

<!-- Portions © International Organization for Standardization 1986:
Permission to copy in any form is granted for use with
conforming SGML systems and applications as defined in
ISO 8879, provided this notice is included in all copies.
-->

<!-- Relevant ISO entity set is given unless names are newly introduced.
New names (i.e., not in ISO 8879 list) do not clash with any
existing ISO 8879 entity names. ISO 10646 character numbers
are given for each character, in hex. CDATA values are decimal
conversions of the ISO 10646 values and refer to the document
character set. Names are ISO 10646 names.

-->

<!-- C0 Controls and Basic Latin -->
quotation mark = APL quote, U+0022 ISOnum:
""" = 0x22 (UTF-8 encoded octet)
""" = &quot;
""" = &#34;
""" = &#x22;
ampersand, U+0026 ISOnum:
"&" = 0x26 (UTF-8 encoded octet)
"&" = &amp;
"&" = &#38;
"&" = &#x26;
less-than sign, U+003C ISOnum:
""<" = &lt;
"<" = &#60;
"<" = &#x3c;
greater-than sign, U+003E ISOnum:
">" = 0x3e (UTF-8 encoded octet)
">" = &gt;
">" = &#62;
">" = &#x3e;
<!-- Latin Extended-A -->
latin capital ligature OE, U+0152 ISOlat2:
"Œ" = 0xc5 0x92 (UTF-8 encoded octets)
"Œ" = &OElig;
"Œ" = &#338;
"Œ" = &#x152;
latin small ligature oe, U+0153 ISOlat2:
"œ" = 0xc5 0x93 (UTF-8 encoded octets)
"œ" = &oelig;
"œ" = &#339;
"œ" = &#x153;
<!-- ligature is a misnomer, this is a separate character in some languages -->
latin capital letter S with caron, U+0160 ISOlat2:
"Š" = 0xc5 0xa0 (UTF-8 encoded octets)
"Š" = &Scaron;
"Š" = &#352;
"Š" = &#x160;
latin small letter s with caron, U+0161 ISOlat2:
"š" = 0xc5 0xa1 (UTF-8 encoded octets)
"š" = &scaron;
"š" = &#353;
"š" = &#x161;
latin capital letter Y with diaeresis, U+0178 ISOlat2:
"Ÿ" = 0xc5 0xb8 (UTF-8 encoded octets)
"Ÿ" = &Yuml;
"Ÿ" = &#376;
"Ÿ" = &#x178;
<!-- Spacing Modifier Letters -->
modifier letter circumflex accent, U+02C6 ISOpub:
"ˆ" = 0xcb 0x86 (UTF-8 encoded octets)
"ˆ" = &circ;
"ˆ" = &#710;
"ˆ" = &#x2c6;
small tilde, U+02DC ISOdia:
"˜" = 0xcb 0x9c (UTF-8 encoded octets)
"˜" = &tilde;
"˜" = &#732;
"˜" = &#x2dc;
<!-- General Punctuation -->
en space, U+2002 ISOpub:
" " = 0xe2 0x80 0x82 (UTF-8 encoded octets)
" " = &ensp;
" " = &#8194;
" " = &#x2002;
em space, U+2003 ISOpub:
" " = 0xe2 0x80 0x83 (UTF-8 encoded octets)
" " = &emsp;
" " = &#8195;
" " = &#x2003;
thin space, U+2009 ISOpub:
" " = 0xe2 0x80 0x89 (UTF-8 encoded octets)
" " = &thinsp;
" " = &#8201;
" " = &#x2009;
zero width non-joiner, U+200C NEW RFC 2070:
"" = 0xe2 0x80 0x8c (UTF-8 encoded octets)
"�" = &zwnj;
"�" = &#8204;
"�" = &#x200c;
zero width joiner, U+200D NEW RFC 2070:
"" = 0xe2 0x80 0x8d (UTF-8 encoded octets)
"�" = &zwj;
"�" = &#8205;
"�" = &#x200d;
left-to-right mark, U+200E NEW RFC 2070:
"" = 0xe2 0x80 0x8e (UTF-8 encoded octets)
"�" = &lrm;
"�" = &#8206;
"�" = &#x200e;
right-to-left mark, U+200F NEW RFC 2070:
"" = 0xe2 0x80 0x8f (UTF-8 encoded octets)
"�" = &rlm;
"�" = &#8207;
"�" = &#x200f;
en dash, U+2013 ISOpub:
"–" = 0xe2 0x80 0x93 (UTF-8 encoded octets)
"–" = &ndash;
"–" = &#8211;
"–" = &#x2013;
em dash, U+2014 ISOpub:
"—" = 0xe2 0x80 0x94 (UTF-8 encoded octets)
"—" = &mdash;
"—" = &#8212;
"—" = &#x2014;
left single quotation mark, U+2018 ISOnum:
"‘" = 0xe2 0x80 0x98 (UTF-8 encoded octets)
"‘" = &lsquo;
"‘" = &#8216;
"‘" = &#x2018;
right single quotation mark, U+2019 ISOnum:
"’" = 0xe2 0x80 0x99 (UTF-8 encoded octets)
"’" = &rsquo;
"’" = &#8217;
"’" = &#x2019;
single low-9 quotation mark, U+201A NEW:
"‚" = 0xe2 0x80 0x9a (UTF-8 encoded octets)
"‚" = &sbquo;
"‚" = &#8218;
"‚" = &#x201a;
left double quotation mark, U+201C ISOnum:
"“" = 0xe2 0x80 0x9c (UTF-8 encoded octets)
"“" = &ldquo;
"“" = &#8220;
"“" = &#x201c;
right double quotation mark, U+201D ISOnum:
"”" = 0xe2 0x80 0x9d (UTF-8 encoded octets)
"”" = &rdquo;
"”" = &#8221;
"”" = &#x201d;
double low-9 quotation mark, U+201E NEW:
"„" = 0xe2 0x80 0x9e (UTF-8 encoded octets)
"„" = &bdquo;
"„" = &#8222;
"„" = &#x201e;
dagger, U+2020 ISOpub:
"†" = 0xe2 0x80 0xa0 (UTF-8 encoded octets)
"†" = &dagger;
"†" = &#8224;
"†" = &#x2020;
double dagger, U+2021 ISOpub:
"‡" = 0xe2 0x80 0xa1 (UTF-8 encoded octets)
"‡" = &Dagger;
"‡" = &#8225;
"‡" = &#x2021;
per mille sign, U+2030 ISOtech:
"‰" = 0xe2 0x80 0xb0 (UTF-8 encoded octets)
"‰" = &permil;
"‰" = &#8240;
"‰" = &#x2030;
single left-pointing angle quotation mark, U+2039 ISO proposed:
"‹" = 0xe2 0x80 0xb9 (UTF-8 encoded octets)
"‹" = &lsaquo;
"‹" = &#8249;
"‹" = &#x2039;
<!-- lsaquo is proposed but not yet ISO standardized -->
single right-pointing angle quotation mark, U+203A ISO proposed:
"›" = 0xe2 0x80 0xba (UTF-8 encoded octets)
"›" = &rsaquo;
"›" = &#8250;
"›" = &#x203a;
<!-- rsaquo is proposed but not yet ISO standardized -->
euro sign, U+20AC NEW:
"€" = 0xe2 0x82 0xac (UTF-8 encoded octets)
"€" = &euro;
"€" = &#8364;
"€" = &#x20ac;
This discussion has been archived. No new comments can be posted.
Display Options Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.