utf8 test - HTML special - in a story submission - HTML formatted

posted by martyb on Thursday July 24 2014, @02:16PM

martyb writes:

Summary:

Tests all HTML Special named character entities defined in HTML 4. Each test point contains:

The named character entity's description taken from the HTML 4 specification.
The entity displayed as a UTF-8 encoded character.
The entity displayed as a Named Character Entity.
The entity displayed as a Decimal Character Entity.
The entity displayed as a Hexadecimal Character Entity.

Documents Referenced:

http://www.w3.org/TR/html4/sgml/entities.html: "Character entity references in HTML 4"
https://tools.ietf.org/html/rfc3629: "UTF-8, a transformation format of ISO 10646"

Tests:

quotation mark = APL quote, U+0022 ISOnum:: """ = 0x22 (UTF-8 encoded octet); """ = "; """ = "; """ = "
ampersand, U+0026 ISOnum:: "&" = 0x26 (UTF-8 encoded octet); "&" = &; "&" = &; "&" = &
less-than sign, U+003C ISOnum:: ""<" = <; "<" = <; "<" = <
greater-than sign, U+003E ISOnum:: ">" = 0x3e (UTF-8 encoded octet); ">" = >; ">" = >; ">" = >
latin capital ligature OE, U+0152 ISOlat2:: "Œ" = 0xc5 0x92 (UTF-8 encoded octets); "Œ" = &OElig;; "Œ" = Œ; "Œ" = Œ
latin small ligature oe, U+0153 ISOlat2:: "œ" = 0xc5 0x93 (UTF-8 encoded octets); "œ" = &oelig;; "œ" = œ; "œ" = œ
latin capital letter S with caron, U+0160 ISOlat2:: "Š" = 0xc5 0xa0 (UTF-8 encoded octets); "Š" = &Scaron;; "Š" = Š; "Š" = Š
latin small letter s with caron, U+0161 ISOlat2:: "š" = 0xc5 0xa1 (UTF-8 encoded octets); "š" = &scaron;; "š" = š; "š" = š
latin capital letter Y with diaeresis, U+0178 ISOlat2:: "Ÿ" = 0xc5 0xb8 (UTF-8 encoded octets); "Ÿ" = &Yuml;; "Ÿ" = Ÿ; "Ÿ" = Ÿ
modifier letter circumflex accent, U+02C6 ISOpub:: "ˆ" = 0xcb 0x86 (UTF-8 encoded octets); "ˆ" = &circ;; "ˆ" = ˆ; "ˆ" = ˆ
small tilde, U+02DC ISOdia:: "˜" = 0xcb 0x9c (UTF-8 encoded octets); "˜" = &tilde;; "˜" = ˜; "˜" = ˜
en space, U+2002 ISOpub:: " " = 0xe2 0x80 0x82 (UTF-8 encoded octets); " " = &ensp;; " " =  ; " " =  
em space, U+2003 ISOpub:: " " = 0xe2 0x80 0x83 (UTF-8 encoded octets); " " = &emsp;; " " =  ; " " =  
thin space, U+2009 ISOpub:: " " = 0xe2 0x80 0x89 (UTF-8 encoded octets); " " =  ; " " =  ; " " =  
zero width non-joiner, U+200C NEW RFC 2070:: "" = 0xe2 0x80 0x8c (UTF-8 encoded octets); "�" = &zwnj;; "�" = ‌; "�" = ‌
zero width joiner, U+200D NEW RFC 2070:: "" = 0xe2 0x80 0x8d (UTF-8 encoded octets); "�" = &zwj;; "�" = ‍; "�" = ‍
left-to-right mark, U+200E NEW RFC 2070:: "" = 0xe2 0x80 0x8e (UTF-8 encoded octets); "�" = &lrm;; "�" = ‎; "�" = ‎
right-to-left mark, U+200F NEW RFC 2070:: "" = 0xe2 0x80 0x8f (UTF-8 encoded octets); "�" = &rlm;; "�" = ‏; "�" = ‏
en dash, U+2013 ISOpub:: "–" = 0xe2 0x80 0x93 (UTF-8 encoded octets); "–" = –; "–" = –; "–" = –
em dash, U+2014 ISOpub:: "—" = 0xe2 0x80 0x94 (UTF-8 encoded octets); "—" = —; "—" = —; "—" = —
left single quotation mark, U+2018 ISOnum:: "‘" = 0xe2 0x80 0x98 (UTF-8 encoded octets); "‘" = ‘; "‘" = ‘; "‘" = ‘
right single quotation mark, U+2019 ISOnum:: "’" = 0xe2 0x80 0x99 (UTF-8 encoded octets); "’" = ’; "’" = ’; "’" = ’
single low-9 quotation mark, U+201A NEW:: "‚" = 0xe2 0x80 0x9a (UTF-8 encoded octets); "‚" = &sbquo;; "‚" = ‚; "‚" = ‚
left double quotation mark, U+201C ISOnum:: "“" = 0xe2 0x80 0x9c (UTF-8 encoded octets); "“" = “; "“" = “; "“" = “
right double quotation mark, U+201D ISOnum:: "”" = 0xe2 0x80 0x9d (UTF-8 encoded octets); "”" = ”; "”" = ”; "”" = ”
double low-9 quotation mark, U+201E NEW:: "„" = 0xe2 0x80 0x9e (UTF-8 encoded octets); "„" = &bdquo;; "„" = „; "„" = „
dagger, U+2020 ISOpub:: "†" = 0xe2 0x80 0xa0 (UTF-8 encoded octets); "†" = &dagger;; "†" = †; "†" = †
double dagger, U+2021 ISOpub:: "‡" = 0xe2 0x80 0xa1 (UTF-8 encoded octets); "‡" = &Dagger;; "‡" = ‡; "‡" = ‡
per mille sign, U+2030 ISOtech:: "‰" = 0xe2 0x80 0xb0 (UTF-8 encoded octets); "‰" = &permil;; "‰" = ‰; "‰" = ‰
single left-pointing angle quotation mark, U+2039 ISO proposed:: "‹" = 0xe2 0x80 0xb9 (UTF-8 encoded octets); "‹" = &lsaquo;; "‹" = ‹; "‹" = ‹
single right-pointing angle quotation mark, U+203A ISO proposed:: "›" = 0xe2 0x80 0xba (UTF-8 encoded octets); "›" = &rsaquo;; "›" = ›; "›" = ›
euro sign, U+20AC NEW:: "€" = 0xe2 0x82 0xac (UTF-8 encoded octets); "€" = €; "€" = €; "€" = €

This discussion has been archived. No new comments can be posted.

utf8 test - HTML special - in a story submission - HTML formatted | Log In/Create an Account | Top | 1000 moderator points | Search Discussion

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

Dev.SN

Dev.SN ♥ developers

Navigation

Sections

Dev.SN

Log In

utf8 test - HTML special - in a story submission - HTML formatted

Dev.SN

Dev.SN ♥ developers

Navigation

Sections

Dev.SN

Log In

Related Links

utf8 test - HTML special - in a story submission - HTML formatted