Stories
Slash Boxes
Comments

Dev.SN ♥ developers

posted by martyb on Sunday May 24 2015, @10:53PM   Printer-friendly

This is a test story which contains a variety of 1-, 2-, and 3-octet UTF-8 chars. The purpose is to see how well the e-mailing of stories handles these characters. These chars were entered directly (actually, cut-and-paste) as opposed to being entered as decimal/hex/named character entities.
The following is taken from: "3. UTF-8 definition" in: https://tools.ietf.org/html/rfc3629

Char. number range  |        UTF-8 octet sequence
   (hexadecimal)    |              (binary)
--------------------+---------------------------------------------
0000 0000-0000 007F | 0xxxxxxx
0000 0080-0000 07FF | 110xxxxx 10xxxxxx
0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

peugen 0x40 0x7f 0x0140 0x017f 0x0700 0x073f 0x0800 0x083f | peu2utf8 > bleh.txt
cat bleh.txt

BEGIN
@ABCDEFGHIJKLMNO
PQRSTUVWXYZ[\]^_
`abcdefghijklmno
pqrstuvwxyz{|}~�

ŀŁłŃńŅņŇňʼnŊŋŌōŎŏ
ŐőŒœŔŕŖŗŘřŚśŜŝŞş
ŠšŢţŤťŦŧŨũŪūŬŭŮů
ŰűŲųŴŵŶŷŸŹźŻżŽžſ

܀܁܂܃܄܅܆܇܈܉܊܋܌܍܎܏
ܐܑܒܓܔܕܖܗܘܙܚܛܜܝܞܟ
ܠܡܢܣܤܥܦܧܨܩܪܫܬܭܮܯ

ࠀࠁࠂࠃࠄࠅࠆࠇࠈࠉࠊࠋࠌࠍࠎࠏ
ࠐࠑࠒࠓࠔࠕࠖࠗ࠘࠙ࠚ
ࠠࠡࠢࠣࠤࠥࠦࠧࠨ࠮࠯
࠰࠱࠲࠳࠴࠵࠶࠷࠸࠹࠺࠻࠼࠽࠾࠿



END.

This discussion has been archived. No new comments can be posted.
Display Options Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.