UTF-8

出自福留子孫
在2023年6月24日 (六) 21:36由丁志仁對話 | 貢獻所做的修訂版本

跳轉到: 導覽搜尋

UTF-8編碼位元組含義:

  • ◽◾◾◾◾◾◾◾,對於UTF-8編碼中的任意位元組B,如果B的第一位為0,則B獨立的表示一個字元(ASCII碼);
  • ◾◽??????,128~191,如果B的第一位為1,第二位為0,則B為一個多位元組字元中的一個位元組(非ASCII字元);
識別位元固定為 128 ,表值 0~63。
  • ◾◾◽?????,192~223,如果B的前兩位為1,第三位為0,則B為兩個位元組表示的字元中的第一個位元組;
識別位元固定為 192 ,(0~31)×64。
  • ◾◾◾◽????,224~239,如果B的前三位為1,第四位為0,則B為三個位元組表示的字元中的第一個位元組;
  • ◾◾◾◾◽???,240~247,如果B的前四位為1,第五位為0,則B為四個位元組表示的字元中的第一個位元組;

組成字(以「瓦」的中文 3 byte 為例):

  1. 第一 byte 231, 231-224=7 ,單位 4096
  2. 第二 byte 147, 147-128=19 ,單位 64
  3. 第三 byte 166, 166-128=38 ,單位 1
字碼 = 7×4096+19×64+38=29926 ,字碼寫成 瓦

0~127

0~31 控制字元

DEC OCT HEX BIN Symbol HTML Number HTML Name Description
00000000000000NUL� Null character
10010100000001SOH Start of Heading
20020200000010STX Start of Text
30030300000011ETX End of Text
40040400000100EOT End of Transmission
50050500000101ENQ Enquiry
60060600000110ACK Acknowledge
70070700000111BEL Bell, Alert
80100800001000BS Backspace
90110900001001HT	 Horizontal Tab
100120A00001010LF
 Line Feed
110130B00001011VT Vertical Tabulation
120140C00001100FF Form Feed
130150D00001101CR
 Carriage Return
140160E00001110SO Shift Out
150170F00001111SI Shift In
160201000010000DLE Data Link Escape
170211100010001DC1 Device Control One (XON)
180221200010010DC2 Device Control Two
190231300010011DC3 Device Control Three (XOFF)
200241400010100DC4 Device Control Four
210251500010101NAK Negative Acknowledge
220261600010110SYN Synchronous Idle
230271700010111ETB End of Transmission Block
240301800011000CAN Cancel
250311900011001EM End of medium
260321A00011010SUB Substitute
270331B00011011ESC Escape
280341C00011100FS File Separator
290351D00011101GS Group Separator
300361E00011110RS Record Separator
310371F00011111US Unit Separator

32~127 顯示字元

</tbody>
DEC OCT HEX BIN Symbol HTML Number HTML Name Description
320402000100000SP&#32; Space
330412100100001!&#33;&excl;Exclamation mark
340422200100010'&#34;&quot;Double quotes (or speech marks)
350432300100011#&#35;&num;Number sign
360442400100100$&#36;&dollar;Dollar
370452500100101%&#37;&percnt;Per cent sign
380462600100110&&#38;&amp;Ampersand
390472700100111'&#39;&apos;Single quote
400502800101000(&#40;&lparen;Open parenthesis (or open bracket)
410512900101001)&#41;&rparen;Close parenthesis (or close bracket)
420522A00101010*&#42;&ast;Asterisk
430532B00101011+&#43;&plus;Plus
440542C00101100,&#44;&comma;Comma
450552D00101101-&#45; Hyphen-minus
460562E00101110.&#46;&period;Period, dot or full stop
470572F00101111/&#47;&sol;Slash or divide
4806030001100000&#48; Zero
4906131001100011&#49; One
5006232001100102&#50; Two
5106333001100113&#51; Three
5206434001101004&#52; Four
5306535001101015&#53; Five
5406636001101106&#54; Six
5506737001101117&#55; Seven
5607038001110008&#56; Eight
5707139001110019&#57; Nine
580723A00111010:&#58;&colon;Colon
590733B00111011;&#59;&semi;Semicolon
600743C00111100<&#60;&lt;Less than (or open angled bracket)
610753D00111101=&#61;&equals;Equals
620763E00111110>&#62;&gt;Greater than (or close angled bracket)
630773F00111111?&#63;&quest;Question mark
641004001000000@&#64;&commat;At sign
651014101000001A&#65; Uppercase A
661024201000010B&#66; Uppercase B
671034301000011C&#67; Uppercase C
681044401000100D&#68; Uppercase D
691054501000101E&#69; Uppercase E
701064601000110F&#70; Uppercase F
711074701000111G&#71; Uppercase G
721104801001000H&#72; Uppercase H
731114901001001I&#73; Uppercase I
741124A01001010J&#74; Uppercase J
751134B01001011K&#75; Uppercase K
761144C01001100L&#76; Uppercase L
771154D01001101M&#77; Uppercase M
781164E01001110N&#78; Uppercase N
791174F01001111O&#79; Uppercase O
801205001010000P&#80; Uppercase P
811215101010001Q&#81; Uppercase Q
821225201010010R&#82; Uppercase R
831235301010011S&#83; Uppercase S
841245401010100T&#84; Uppercase T
851255501010101U&#85; Uppercase U
861265601010110V&#86; Uppercase V
871275701010111W&#87; Uppercase W
881305801011000X&#88; Uppercase X
891315901011001Y&#89; Uppercase Y
901325A01011010Z&#90; Uppercase Z
911335B01011011[&#91;&lsqb;Opening bracket
921345C01011100\&#92;&bsol;Backslash
931355D01011101]&#93;&rsqb;Closing bracket
941365E01011110^&#94;&Hat;Caret - circumflex
951375F01011111_&#95;&lowbar;Underscore
961406001100000`&#96;&grave;Grave accent
971416101100001a&#97; Lowercase a
981426201100010b&#98; Lowercase b
991436301100011c&#99; Lowercase c
1001446401100100d&#100; Lowercase d
1011456501100101e&#101; Lowercase e
1021466601100110f&#102; Lowercase f
1031476701100111g&#103; Lowercase g
1041506801101000h&#104; Lowercase h
1051516901101001i&#105; Lowercase i
1061526A01101010j&#106; Lowercase j
1071536B01101011k&#107; Lowercase k
1081546C01101100l&#108; Lowercase l
1091556D01101101m&#109; Lowercase m
1101566E01101110n&#110; Lowercase n
1111576F01101111o&#111; Lowercase o
1121607001110000p&#112; Lowercase p
1131617101110001q&#113; Lowercase q
1141627201110010r&#114; Lowercase r
1151637301110011s&#115; Lowercase s
1161647401110100t&#116; Lowercase t
1171657501110101u&#117; Lowercase u
1181667601110110v&#118; Lowercase v
1191677701110111w&#119; Lowercase w
1201707801111000x&#120; Lowercase x
1211717901111001y&#121; Lowercase y
1221727A01111010z&#122; Lowercase z
1231737B01111011{&#123;&lcub;Opening brace
1241747C01111100|&#124;&verbar;Vertical bar
1251757D01111101}&#125;&rcub;Closing brace
1261767E01111110~&#126;&tilde;Equivalency sign - tilde
1271777F01111111DEL&#127; Delete

等價

: 「◾◽??????」與「◽◽??????」皆合法且等價

  1. https://graphemica.com/%C2%A1
  2. https://www.ascii-code.com/
  3. http://jendo.org/study/showChar.html
  4. http://jendo.org/study/seeDecode.php