UTF-8

出自福留子孫
在2023年6月24日 (六) 21:37由丁志仁對話 | 貢獻所做的修訂版本

跳轉到: 導覽搜尋

UTF-8編碼位元組含義:

  • ◽◾◾◾◾◾◾◾,對於UTF-8編碼中的任意位元組B,如果B的第一位為0,則B獨立的表示一個字元(ASCII碼);
  • ◾◽??????,128~191,如果B的第一位為1,第二位為0,則B為一個多位元組字元中的一個位元組(非ASCII字元);
識別位元固定為 128 ,表值 0~63。
  • ◾◾◽?????,192~223,如果B的前兩位為1,第三位為0,則B為兩個位元組表示的字元中的第一個位元組;
識別位元固定為 192 ,(0~31)×64。
  • ◾◾◾◽????,224~239,如果B的前三位為1,第四位為0,則B為三個位元組表示的字元中的第一個位元組;
  • ◾◾◾◾◽???,240~247,如果B的前四位為1,第五位為0,則B為四個位元組表示的字元中的第一個位元組;

組成字(以「瓦」的中文 3 byte 為例):

  1. 第一 byte 231, 231-224=7 ,單位 4096
  2. 第二 byte 147, 147-128=19 ,單位 64
  3. 第三 byte 166, 166-128=38 ,單位 1
字碼 = 7×4096+19×64+38=29926 ,字碼寫成 瓦

1 位元組編碼:

0~31 控制字元

DEC OCT HEX BIN Symbol HTML Number HTML Name Description
00000000000000NUL� Null character
10010100000001SOH Start of Heading
20020200000010STX Start of Text
30030300000011ETX End of Text
40040400000100EOT End of Transmission
50050500000101ENQ Enquiry
60060600000110ACK Acknowledge
70070700000111BEL Bell, Alert
80100800001000BS Backspace
90110900001001HT	 Horizontal Tab
100120A00001010LF
 Line Feed
110130B00001011VT Vertical Tabulation
120140C00001100FF Form Feed
130150D00001101CR
 Carriage Return
140160E00001110SO Shift Out
150170F00001111SI Shift In
160201000010000DLE Data Link Escape
170211100010001DC1 Device Control One (XON)
180221200010010DC2 Device Control Two
190231300010011DC3 Device Control Three (XOFF)
200241400010100DC4 Device Control Four
210251500010101NAK Negative Acknowledge
220261600010110SYN Synchronous Idle
230271700010111ETB End of Transmission Block
240301800011000CAN Cancel
250311900011001EM End of medium
260321A00011010SUB Substitute
270331B00011011ESC Escape
280341C00011100FS File Separator
290351D00011101GS Group Separator
300361E00011110RS Record Separator
310371F00011111US Unit Separator

32~127 顯示字元

DEC OCT HEX BIN Symbol HTML Number HTML Name Description
320402000100000SP  Space
330412100100001!!!Exclamation mark
340422200100010'""Double quotes (or speech marks)
350432300100011###Number sign
360442400100100$$$Dollar
370452500100101%%%Per cent sign
380462600100110&&&Ampersand
390472700100111'''Single quote
400502800101000((&lparen;Open parenthesis (or open bracket)
410512900101001))&rparen;Close parenthesis (or close bracket)
420522A00101010***Asterisk
430532B00101011+++Plus
440542C00101100,,,Comma
450552D00101101-- Hyphen-minus
460562E00101110...Period, dot or full stop
470572F00101111///Slash or divide
48060300011000000 Zero
49061310011000111 One
50062320011001022 Two
51063330011001133 Three
52064340011010044 Four
53065350011010155 Five
54066360011011066 Six
55067370011011177 Seven
56070380011100088 Eight
57071390011100199 Nine
580723A00111010:::Colon
590733B00111011;&#59;;Semicolon
600743C00111100<&#60;&lt;Less than (or open angled bracket)
610753D00111101=&#61;&equals;Equals
620763E00111110>&#62;&gt;Greater than (or close angled bracket)
630773F00111111?&#63;&quest;Question mark
641004001000000@&#64;&commat;At sign
651014101000001A&#65; Uppercase A
661024201000010B&#66; Uppercase B
671034301000011C&#67; Uppercase C
681044401000100D&#68; Uppercase D
691054501000101E&#69; Uppercase E
701064601000110F&#70; Uppercase F
711074701000111G&#71; Uppercase G
721104801001000H&#72; Uppercase H
731114901001001I&#73; Uppercase I
741124A01001010J&#74; Uppercase J
751134B01001011K&#75; Uppercase K
761144C01001100L&#76; Uppercase L
771154D01001101M&#77; Uppercase M
781164E01001110N&#78; Uppercase N
791174F01001111O&#79; Uppercase O
801205001010000P&#80; Uppercase P
811215101010001Q&#81; Uppercase Q
821225201010010R&#82; Uppercase R
831235301010011S&#83; Uppercase S
841245401010100T&#84; Uppercase T
851255501010101U&#85; Uppercase U
861265601010110V&#86; Uppercase V
871275701010111W&#87; Uppercase W
881305801011000X&#88; Uppercase X
891315901011001Y&#89; Uppercase Y
901325A01011010Z&#90; Uppercase Z
911335B01011011[&#91;&lsqb;Opening bracket
921345C01011100\&#92;&bsol;Backslash
931355D01011101]&#93;&rsqb;Closing bracket
941365E01011110^&#94;&Hat;Caret - circumflex
951375F01011111_&#95;&lowbar;Underscore
961406001100000`&#96;&grave;Grave accent
971416101100001a&#97; Lowercase a
981426201100010b&#98; Lowercase b
991436301100011c&#99; Lowercase c
1001446401100100d&#100; Lowercase d
1011456501100101e&#101; Lowercase e
1021466601100110f&#102; Lowercase f
1031476701100111g&#103; Lowercase g
1041506801101000h&#104; Lowercase h
1051516901101001i&#105; Lowercase i
1061526A01101010j&#106; Lowercase j
1071536B01101011k&#107; Lowercase k
1081546C01101100l&#108; Lowercase l
1091556D01101101m&#109; Lowercase m
1101566E01101110n&#110; Lowercase n
1111576F01101111o&#111; Lowercase o
1121607001110000p&#112; Lowercase p
1131617101110001q&#113; Lowercase q
1141627201110010r&#114; Lowercase r
1151637301110011s&#115; Lowercase s
1161647401110100t&#116; Lowercase t
1171657501110101u&#117; Lowercase u
1181667601110110v&#118; Lowercase v
1191677701110111w&#119; Lowercase w
1201707801111000x&#120; Lowercase x
1211717901111001y&#121; Lowercase y
1221727A01111010z&#122; Lowercase z
1231737B01111011{&#123;&lcub;Opening brace
1241747C01111100|&#124;&verbar;Vertical bar
1251757D01111101}&#125;&rcub;Closing brace
1261767E01111110~&#126;&tilde;Equivalency sign - tilde
1271777F01111111DEL&#127; Delete

2 位元組編碼:

等價

: 「◾◽??????」與「◽◽??????」皆合法且等價

  1. https://graphemica.com/%C2%A1
  2. https://www.ascii-code.com/
  3. http://jendo.org/study/showChar.html
  4. http://jendo.org/study/seeDecode.php