Dev.SN
Dev.SN ♥ developers
https://dev.soylentnews.org/

Title    Test e-mailing of stories that contain multi-octet UTF-8 chars
Date    Sunday May 24 2015, @10:53PM
Author    martyb
Topic   
from the dept.
https://dev.soylentnews.org/article.pl?sid=15/05/25/0252239

martyb writes:

This is a test story which contains a variety of 1-, 2-, and 3-octet UTF-8 chars. The purpose is to see how well the e-mailing of stories handles these characters. These chars were entered directly (actually, cut-and-paste) as opposed to being entered as decimal/hex/named character entities.
The following is taken from: "3. UTF-8 definition" in: https://tools.ietf.org/html/rfc3629

Char. number range  |        UTF-8 octet sequence
   (hexadecimal)    |              (binary)
--------------------+---------------------------------------------
0000 0000-0000 007F | 0xxxxxxx
0000 0080-0000 07FF | 110xxxxx 10xxxxxx
0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

peugen 0x40 0x7f 0x0140 0x017f 0x0700 0x073f 0x0800 0x083f | peu2utf8 > bleh.txt
cat bleh.txt

BEGIN
@ABCDEFGHIJKLMNO
PQRSTUVWXYZ[\]^_
`abcdefghijklmno
pqrstuvwxyz{|}~�

ŀŁłŃńŅņŇňʼnŊŋŌōŎŏ
ŐőŒœŔŕŖŗŘřŚśŜŝŞş
ŠšŢţŤťŦŧŨũŪūŬŭŮů
ŰűŲųŴŵŶŷŸŹźŻżŽžſ

܀܁܂܃܄܅܆܇܈܉܊܋܌܍܎܏
ܐܑܒܓܔܕܖܗܘܙܚܛܜܝܞܟ
ܠܡܢܣܤܥܦܧܨܩܪܫܬܭܮܯ

ࠀࠁࠂࠃࠄࠅࠆࠇࠈࠉࠊࠋࠌࠍࠎࠏ
ࠐࠑࠒࠓࠔࠕࠖࠗ࠘࠙ࠚ
ࠠࠡࠢࠣࠤࠥࠦࠧࠨ࠮࠯
࠰࠱࠲࠳࠴࠵࠶࠷࠸࠹࠺࠻࠼࠽࠾࠿



END.

Links

  1. "martyb" - https://dev.soylentnews.org/~martyb/

© Copyright 2024 - Soylent News, All Rights Reserved

printed from Dev.SN, Test e-mailing of stories that contain multi-octet UTF-8 chars on 2024-05-16 16:38:50