UTF-8:修訂版本之間的差異

出自福留子孫
跳轉到: 導覽搜尋
等價
 
(未顯示同用戶所作出之21次版本)
第 1 行: 第 1 行:
 
[[分類:標準]]
 
[[分類:標準]]
'''UTF-8編碼位元組含義:'''
+
==特殊字元==
 +
===有 HTML 或 URL 意涵的字元===
 +
<table class=nicetable>
 +
<tr><th>字元</th><th>實體名稱</th><th>字元編碼</th></tr>
 +
<tr><td>不換行空格</td><td><code>&amp;nbsp;</code></td><td><code>&amp;#160;</code></td></tr>
 +
<tr><td>&lt;</td><td><code>&amp;lt;</code></td><td><code>&amp;#60;</code></td></tr>
 +
<tr><td>&gt;</td><td><code>&amp;gt;</code></td><td><code>&amp;#62;</code></td></tr>
 +
<tr><td>"</td><td><code>&amp;quot;</code></td><td><code>&amp;#34;</code></td></tr>
 +
<tr><td>'</td><td><code>&amp;apos;</code></td><td><code>&amp;#39;</code></td></tr>
 +
<tr><td>&amp;</td><td><code>&amp;amp;</code></td><td><code>&amp;#38;</code></td></tr>
 +
</table>
 +
 
 +
==碼位與字元==
 +
===UTF-8編碼位元組含義:===
 
*◽◾◾◾◾◾◾◾,對於UTF-8編碼中的任意位元組B,如果B的第一位為0,則B獨立的表示一個字元(ASCII碼);
 
*◽◾◾◾◾◾◾◾,對於UTF-8編碼中的任意位元組B,如果B的第一位為0,則B獨立的表示一個字元(ASCII碼);
 
*◾◽??????,128~191,如果B的第一位為1,第二位為0,則B為一個多位元組字元中的一個位元組(非ASCII字元);
 
*◾◽??????,128~191,如果B的第一位為1,第二位為0,則B為一個多位元組字元中的一個位元組(非ASCII字元);
第 14 行: 第 27 行:
 
:字碼 = 7×4096+19×64+38=29926 ,字碼寫成 &amp;#29926;
 
:字碼 = 7×4096+19×64+38=29926 ,字碼寫成 &amp;#29926;
  
'''等價''':
+
===顯示編碼位的方法:===
「◾◽??????」與「◽◽??????」皆合法且等價
+
以 PHP 為例。
 +
====(一)由整數碼位生成對應的字符====
 +
#chr(整數碼位):產生單 byte 字元,如 ASCII、ISO-8859 或 Windows 1252。整數由 0~255 。無法通過傳遞一個 Unicode 碼位值來產生多位元組編碼 (像是 UTF-8 或 UTF-16) 字串。超過有效範圍 (0~255) 的值將用以下演算法處理:<pre> while($bytevalue < 0){$bytevalue += 256;}&#10; $bytevalue %= 256;</pre>運用多個chr(…)連接可以得到多 byte UTF-8 字符,如「echo chr(240).chr(159).chr(144).chr(152);」會得到「🐘」。<br/>碼位寬度有幾 byte 和顯示字型有多寬無關。
 +
#mb_chr(整數碼位,'UTF-8'):在啟用了 mbstring 擴展的前提下,可以使用。即使是控制碼也生成。
 +
# &amp;#整數碼位;:
 +
#html_entity_decode('&amp;#整數碼位;', ENT_NOQUOTES, 'UTF-8')
 +
 
 +
====(二)找出字符的碼位====
 +
#ord(字串):轉換字串第一個位元組為某個碼位 (0~255 之間的值)
 +
#mb_ord(字串, '編碼'):獲取字串的第一個字元在某一種 Unicode 編碼中的碼位值。編碼可以是 UTF-8、ISO-8859-1、Windows-1252。
 +
#htmlentities(字串[,flags,'字符集',是否轉換]):將字串中的特殊字元轉換為 HTML 轉義字元(&#…;)
 +
 
 +
===1 位元組編碼:===
 +
====0~31 控制字元====
 +
<table class='nicetable'>
 +
<tr>
 +
<th>DEC</th>
 +
<th>OCT</th>
 +
<th>HEX</th>
 +
<th>BIN</th>
 +
<th>Symbol</th>
 +
<th>HTML Number</th>
 +
<th>HTML Name</th>
 +
<th>Description</th>
 +
</tr>
 +
<tr><td>0</td><td>000</td><td>00</td><td>00000000</td><td>NUL</td><td>&amp;#00;</td><td>&nbsp;</td><td>Null character</td></tr>
 +
<tr><td>1</td><td>001</td><td>01</td><td>00000001</td><td>SOH</td><td>&amp;#01;</td><td>&nbsp;</td><td>Start of Heading</td></tr>
 +
<tr><td>2</td><td>002</td><td>02</td><td>00000010</td><td>STX</td><td>&amp;#02;</td><td>&nbsp;</td><td>Start of Text</td></tr>
 +
<tr><td>3</td><td>003</td><td>03</td><td>00000011</td><td>ETX</td><td>&amp;#03;</td><td>&nbsp;</td><td>End of Text</td></tr>
 +
<tr><td>4</td><td>004</td><td>04</td><td>00000100</td><td>EOT</td><td>&amp;#04;</td><td>&nbsp;</td><td>End of Transmission</td></tr>
 +
<tr><td>5</td><td>005</td><td>05</td><td>00000101</td><td>ENQ</td><td>&amp;#05;</td><td>&nbsp;</td><td>Enquiry</td></tr>
 +
<tr><td>6</td><td>006</td><td>06</td><td>00000110</td><td>ACK</td><td>&amp;#06;</td><td>&nbsp;</td><td>Acknowledge</td></tr>
 +
<tr><td>7</td><td>007</td><td>07</td><td>00000111</td><td>BEL</td><td>&amp;#07;</td><td>&nbsp;</td><td>Bell, Alert</td></tr>
 +
<tr><td>8</td><td>010</td><td>08</td><td>00001000</td><td>BS</td><td>&amp;#08;</td><td>&nbsp;</td><td>Backspace</td></tr>
 +
<tr><td>9</td><td>011</td><td>09</td><td>00001001</td><td>HT</td><td>&amp;#09;</td><td>&nbsp;</td><td>Horizontal Tab</td></tr>
 +
<tr><td>10</td><td>012</td><td>0A</td><td>00001010</td><td>LF</td><td>&amp;#10;</td><td>&nbsp;</td><td>Line Feed</td></tr>
 +
<tr><td>11</td><td>013</td><td>0B</td><td>00001011</td><td>VT</td><td>&amp;#11;</td><td>&nbsp;</td><td>Vertical Tabulation</td></tr>
 +
<tr><td>12</td><td>014</td><td>0C</td><td>00001100</td><td>FF</td><td>&amp;#12;</td><td>&nbsp;</td><td>Form Feed</td></tr>
 +
<tr><td>13</td><td>015</td><td>0D</td><td>00001101</td><td>CR</td><td>&amp;#13;</td><td>&nbsp;</td><td>Carriage Return</td></tr>
 +
<tr><td>14</td><td>016</td><td>0E</td><td>00001110</td><td>SO</td><td>&amp;#14;</td><td>&nbsp;</td><td>Shift Out</td></tr>
 +
<tr><td>15</td><td>017</td><td>0F</td><td>00001111</td><td>SI</td><td>&amp;#15;</td><td>&nbsp;</td><td>Shift In</td></tr>
 +
<tr><td>16</td><td>020</td><td>10</td><td>00010000</td><td>DLE</td><td>&amp;#16;</td><td>&nbsp;</td><td>Data Link Escape</td></tr>
 +
<tr><td>17</td><td>021</td><td>11</td><td>00010001</td><td>DC1</td><td>&amp;#17;</td><td>&nbsp;</td><td>Device Control One (XON)</td></tr>
 +
<tr><td>18</td><td>022</td><td>12</td><td>00010010</td><td>DC2</td><td>&amp;#18;</td><td>&nbsp;</td><td>Device Control Two</td></tr>
 +
<tr><td>19</td><td>023</td><td>13</td><td>00010011</td><td>DC3</td><td>&amp;#19;</td><td>&nbsp;</td><td>Device Control Three (XOFF)</td></tr>
 +
<tr><td>20</td><td>024</td><td>14</td><td>00010100</td><td>DC4</td><td>&amp;#20;</td><td>&nbsp;</td><td>Device Control Four</td></tr>
 +
<tr><td>21</td><td>025</td><td>15</td><td>00010101</td><td>NAK</td><td>&amp;#21;</td><td>&nbsp;</td><td>Negative Acknowledge</td></tr>
 +
<tr><td>22</td><td>026</td><td>16</td><td>00010110</td><td>SYN</td><td>&amp;#22;</td><td>&nbsp;</td><td>Synchronous Idle</td></tr>
 +
<tr><td>23</td><td>027</td><td>17</td><td>00010111</td><td>ETB</td><td>&amp;#23;</td><td>&nbsp;</td><td>End of Transmission Block</td></tr>
 +
<tr><td>24</td><td>030</td><td>18</td><td>00011000</td><td>CAN</td><td>&amp;#24;</td><td>&nbsp;</td><td>Cancel</td></tr>
 +
<tr><td>25</td><td>031</td><td>19</td><td>00011001</td><td>EM</td><td>&amp;#25;</td><td>&nbsp;</td><td>End of medium</td></tr>
 +
<tr><td>26</td><td>032</td><td>1A</td><td>00011010</td><td>SUB</td><td>&amp;#26;</td><td>&nbsp;</td><td>Substitute</td></tr>
 +
<tr><td>27</td><td>033</td><td>1B</td><td>00011011</td><td>ESC</td><td>&amp;#27;</td><td>&nbsp;</td><td>Escape</td></tr>
 +
<tr><td>28</td><td>034</td><td>1C</td><td>00011100</td><td>FS</td><td>&amp;#28;</td><td>&nbsp;</td><td>File Separator</td></tr>
 +
<tr><td>29</td><td>035</td><td>1D</td><td>00011101</td><td>GS</td><td>&amp;#29;</td><td>&nbsp;</td><td>Group Separator</td></tr>
 +
<tr><td>30</td><td>036</td><td>1E</td><td>00011110</td><td>RS</td><td>&amp;#30;</td><td>&nbsp;</td><td>Record Separator</td></tr>
 +
<tr><td>31</td><td>037</td><td>1F</td><td>00011111</td><td>US</td><td>&amp;#31;</td><td>&nbsp;</td><td>Unit Separator</td></tr>
 +
</table>
 +
 
 +
====32~127 顯示字元====
 +
<table class='nicetable'>
 +
<tr>
 +
<th>DEC</th>
 +
<th>OCT</th>
 +
<th>HEX</th>
 +
<th>BIN</th>
 +
<th>Symbol</th>
 +
<th>HTML Number</th>
 +
<th>HTML Name</th>
 +
<th>Description</th>
 +
</tr>
 +
<tr><td>32</td><td>040</td><td>20</td><td>00100000</td><td>SP</td><td>&amp;#32;</td><td>&nbsp;</td><td>Space</td></tr>
 +
<tr><td>33</td><td>041</td><td>21</td><td>00100001</td><td>!</td><td>&amp;#33;</td><td>&amp;excl;</td><td>Exclamation mark</td></tr>
 +
<tr><td>34</td><td>042</td><td>22</td><td>00100010</td><td>'</td><td>&amp;#34;</td><td>&amp;quot;</td><td>Double quotes (or speech marks)</td></tr>
 +
<tr><td>35</td><td>043</td><td>23</td><td>00100011</td><td>#</td><td>&amp;#35;</td><td>&amp;num;</td><td>Number sign</td></tr>
 +
<tr><td>36</td><td>044</td><td>24</td><td>00100100</td><td>$</td><td>&amp;#36;</td><td>&amp;dollar;</td><td>Dollar</td></tr>
 +
<tr><td>37</td><td>045</td><td>25</td><td>00100101</td><td>%</td><td>&amp;#37;</td><td>&amp;percnt;</td><td>Per cent sign</td></tr>
 +
<tr><td>38</td><td>046</td><td>26</td><td>00100110</td><td>&amp;</td><td>&amp;#38;</td><td>&amp;amp;</td><td>Ampersand</td></tr>
 +
<tr><td>39</td><td>047</td><td>27</td><td>00100111</td><td>'</td><td>&amp;#39;</td><td>&amp;apos;</td><td>Single quote</td></tr>
 +
<tr><td>40</td><td>050</td><td>28</td><td>00101000</td><td>(</td><td>&amp;#40;</td><td>&amp;lparen;</td><td>Open parenthesis (or open bracket)</td></tr>
 +
<tr><td>41</td><td>051</td><td>29</td><td>00101001</td><td>)</td><td>&amp;#41;</td><td>&amp;rparen;</td><td>Close parenthesis (or close bracket)</td></tr>
 +
<tr><td>42</td><td>052</td><td>2A</td><td>00101010</td><td>*</td><td>&amp;#42;</td><td>&amp;ast;</td><td>Asterisk</td></tr>
 +
<tr><td>43</td><td>053</td><td>2B</td><td>00101011</td><td>+</td><td>&amp;#43;</td><td>&amp;plus;</td><td>Plus</td></tr>
 +
<tr><td>44</td><td>054</td><td>2C</td><td>00101100</td><td>,</td><td>&amp;#44;</td><td>&amp;comma;</td><td>Comma</td></tr>
 +
<tr><td>45</td><td>055</td><td>2D</td><td>00101101</td><td>-</td><td>&amp;#45;</td><td>&nbsp;</td><td>Hyphen-minus</td></tr>
 +
<tr><td>46</td><td>056</td><td>2E</td><td>00101110</td><td>.</td><td>&amp;#46;</td><td>&amp;period;</td><td>Period, dot or full stop</td></tr>
 +
<tr><td>47</td><td>057</td><td>2F</td><td>00101111</td><td>/</td><td>&amp;#47;</td><td>&amp;sol;</td><td>Slash or divide</td></tr>
 +
<tr><td>48</td><td>060</td><td>30</td><td>00110000</td><td>0</td><td>&amp;#48;</td><td>&nbsp;</td><td>Zero</td></tr>
 +
<tr><td>49</td><td>061</td><td>31</td><td>00110001</td><td>1</td><td>&amp;#49;</td><td>&nbsp;</td><td>One</td></tr>
 +
<tr><td>50</td><td>062</td><td>32</td><td>00110010</td><td>2</td><td>&amp;#50;</td><td>&nbsp;</td><td>Two</td></tr>
 +
<tr><td>51</td><td>063</td><td>33</td><td>00110011</td><td>3</td><td>&amp;#51;</td><td>&nbsp;</td><td>Three</td></tr>
 +
<tr><td>52</td><td>064</td><td>34</td><td>00110100</td><td>4</td><td>&amp;#52;</td><td>&nbsp;</td><td>Four</td></tr>
 +
<tr><td>53</td><td>065</td><td>35</td><td>00110101</td><td>5</td><td>&amp;#53;</td><td>&nbsp;</td><td>Five</td></tr>
 +
<tr><td>54</td><td>066</td><td>36</td><td>00110110</td><td>6</td><td>&amp;#54;</td><td>&nbsp;</td><td>Six</td></tr>
 +
<tr><td>55</td><td>067</td><td>37</td><td>00110111</td><td>7</td><td>&amp;#55;</td><td>&nbsp;</td><td>Seven</td></tr>
 +
<tr><td>56</td><td>070</td><td>38</td><td>00111000</td><td>8</td><td>&amp;#56;</td><td>&nbsp;</td><td>Eight</td></tr>
 +
<tr><td>57</td><td>071</td><td>39</td><td>00111001</td><td>9</td><td>&amp;#57;</td><td>&nbsp;</td><td>Nine</td></tr>
 +
<tr><td>58</td><td>072</td><td>3A</td><td>00111010</td><td>:</td><td>&amp;#58;</td><td>&amp;colon;</td><td>Colon</td></tr>
 +
<tr><td>59</td><td>073</td><td>3B</td><td>00111011</td><td>;</td><td>&amp;#59;</td><td>&amp;semi;</td><td>Semicolon</td></tr>
 +
<tr><td>60</td><td>074</td><td>3C</td><td>00111100</td><td>&lt;</td><td>&amp;#60;</td><td>&amp;lt;</td><td>Less than (or open angled bracket)</td></tr>
 +
<tr><td>61</td><td>075</td><td>3D</td><td>00111101</td><td>=</td><td>&amp;#61;</td><td>&amp;equals;</td><td>Equals</td></tr>
 +
<tr><td>62</td><td>076</td><td>3E</td><td>00111110</td><td>&gt;</td><td>&amp;#62;</td><td>&amp;gt;</td><td>Greater than (or close angled bracket)</td></tr>
 +
<tr><td>63</td><td>077</td><td>3F</td><td>00111111</td><td>?</td><td>&amp;#63;</td><td>&amp;quest;</td><td>Question mark</td></tr>
 +
<tr><td>64</td><td>100</td><td>40</td><td>01000000</td><td>@</td><td>&amp;#64;</td><td>&amp;commat;</td><td>At sign</td></tr>
 +
<tr><td>65</td><td>101</td><td>41</td><td>01000001</td><td>A</td><td>&amp;#65;</td><td>&nbsp;</td><td>Uppercase A</td></tr>
 +
<tr><td>66</td><td>102</td><td>42</td><td>01000010</td><td>B</td><td>&amp;#66;</td><td>&nbsp;</td><td>Uppercase B</td></tr>
 +
<tr><td>67</td><td>103</td><td>43</td><td>01000011</td><td>C</td><td>&amp;#67;</td><td>&nbsp;</td><td>Uppercase C</td></tr>
 +
<tr><td>68</td><td>104</td><td>44</td><td>01000100</td><td>D</td><td>&amp;#68;</td><td>&nbsp;</td><td>Uppercase D</td></tr>
 +
<tr><td>69</td><td>105</td><td>45</td><td>01000101</td><td>E</td><td>&amp;#69;</td><td>&nbsp;</td><td>Uppercase E</td></tr>
 +
<tr><td>70</td><td>106</td><td>46</td><td>01000110</td><td>F</td><td>&amp;#70;</td><td>&nbsp;</td><td>Uppercase F</td></tr>
 +
<tr><td>71</td><td>107</td><td>47</td><td>01000111</td><td>G</td><td>&amp;#71;</td><td>&nbsp;</td><td>Uppercase G</td></tr>
 +
<tr><td>72</td><td>110</td><td>48</td><td>01001000</td><td>H</td><td>&amp;#72;</td><td>&nbsp;</td><td>Uppercase H</td></tr>
 +
<tr><td>73</td><td>111</td><td>49</td><td>01001001</td><td>I</td><td>&amp;#73;</td><td>&nbsp;</td><td>Uppercase I</td></tr>
 +
<tr><td>74</td><td>112</td><td>4A</td><td>01001010</td><td>J</td><td>&amp;#74;</td><td>&nbsp;</td><td>Uppercase J</td></tr>
 +
<tr><td>75</td><td>113</td><td>4B</td><td>01001011</td><td>K</td><td>&amp;#75;</td><td>&nbsp;</td><td>Uppercase K</td></tr>
 +
<tr><td>76</td><td>114</td><td>4C</td><td>01001100</td><td>L</td><td>&amp;#76;</td><td>&nbsp;</td><td>Uppercase L</td></tr>
 +
<tr><td>77</td><td>115</td><td>4D</td><td>01001101</td><td>M</td><td>&amp;#77;</td><td>&nbsp;</td><td>Uppercase M</td></tr>
 +
<tr><td>78</td><td>116</td><td>4E</td><td>01001110</td><td>N</td><td>&amp;#78;</td><td>&nbsp;</td><td>Uppercase N</td></tr>
 +
<tr><td>79</td><td>117</td><td>4F</td><td>01001111</td><td>O</td><td>&amp;#79;</td><td>&nbsp;</td><td>Uppercase O</td></tr>
 +
<tr><td>80</td><td>120</td><td>50</td><td>01010000</td><td>P</td><td>&amp;#80;</td><td>&nbsp;</td><td>Uppercase P</td></tr>
 +
<tr><td>81</td><td>121</td><td>51</td><td>01010001</td><td>Q</td><td>&amp;#81;</td><td>&nbsp;</td><td>Uppercase Q</td></tr>
 +
<tr><td>82</td><td>122</td><td>52</td><td>01010010</td><td>R</td><td>&amp;#82;</td><td>&nbsp;</td><td>Uppercase R</td></tr>
 +
<tr><td>83</td><td>123</td><td>53</td><td>01010011</td><td>S</td><td>&amp;#83;</td><td>&nbsp;</td><td>Uppercase S</td></tr>
 +
<tr><td>84</td><td>124</td><td>54</td><td>01010100</td><td>T</td><td>&amp;#84;</td><td>&nbsp;</td><td>Uppercase T</td></tr>
 +
<tr><td>85</td><td>125</td><td>55</td><td>01010101</td><td>U</td><td>&amp;#85;</td><td>&nbsp;</td><td>Uppercase U</td></tr>
 +
<tr><td>86</td><td>126</td><td>56</td><td>01010110</td><td>V</td><td>&amp;#86;</td><td>&nbsp;</td><td>Uppercase V</td></tr>
 +
<tr><td>87</td><td>127</td><td>57</td><td>01010111</td><td>W</td><td>&amp;#87;</td><td>&nbsp;</td><td>Uppercase W</td></tr>
 +
<tr><td>88</td><td>130</td><td>58</td><td>01011000</td><td>X</td><td>&amp;#88;</td><td>&nbsp;</td><td>Uppercase X</td></tr>
 +
<tr><td>89</td><td>131</td><td>59</td><td>01011001</td><td>Y</td><td>&amp;#89;</td><td>&nbsp;</td><td>Uppercase Y</td></tr>
 +
<tr><td>90</td><td>132</td><td>5A</td><td>01011010</td><td>Z</td><td>&amp;#90;</td><td>&nbsp;</td><td>Uppercase Z</td></tr>
 +
<tr><td>91</td><td>133</td><td>5B</td><td>01011011</td><td>[</td><td>&amp;#91;</td><td>&amp;lsqb;</td><td>Opening bracket</td></tr>
 +
<tr><td>92</td><td>134</td><td>5C</td><td>01011100</td><td>\</td><td>&amp;#92;</td><td>&amp;bsol;</td><td>Backslash</td></tr>
 +
<tr><td>93</td><td>135</td><td>5D</td><td>01011101</td><td>]</td><td>&amp;#93;</td><td>&amp;rsqb;</td><td>Closing bracket</td></tr>
 +
<tr><td>94</td><td>136</td><td>5E</td><td>01011110</td><td>^</td><td>&amp;#94;</td><td>&amp;Hat;</td><td>Caret - circumflex</td></tr>
 +
<tr><td>95</td><td>137</td><td>5F</td><td>01011111</td><td>_</td><td>&amp;#95;</td><td>&amp;lowbar;</td><td>Underscore</td></tr>
 +
<tr><td>96</td><td>140</td><td>60</td><td>01100000</td><td>`</td><td>&amp;#96;</td><td>&amp;grave;</td><td>Grave accent</td></tr>
 +
<tr><td>97</td><td>141</td><td>61</td><td>01100001</td><td>a</td><td>&amp;#97;</td><td>&nbsp;</td><td>Lowercase a</td></tr>
 +
<tr><td>98</td><td>142</td><td>62</td><td>01100010</td><td>b</td><td>&amp;#98;</td><td>&nbsp;</td><td>Lowercase b</td></tr>
 +
<tr><td>99</td><td>143</td><td>63</td><td>01100011</td><td>c</td><td>&amp;#99;</td><td>&nbsp;</td><td>Lowercase c</td></tr>
 +
<tr><td>100</td><td>144</td><td>64</td><td>01100100</td><td>d</td><td>&amp;#100;</td><td>&nbsp;</td><td>Lowercase d</td></tr>
 +
<tr><td>101</td><td>145</td><td>65</td><td>01100101</td><td>e</td><td>&amp;#101;</td><td>&nbsp;</td><td>Lowercase e</td></tr>
 +
<tr><td>102</td><td>146</td><td>66</td><td>01100110</td><td>f</td><td>&amp;#102;</td><td>&nbsp;</td><td>Lowercase f</td></tr>
 +
<tr><td>103</td><td>147</td><td>67</td><td>01100111</td><td>g</td><td>&amp;#103;</td><td>&nbsp;</td><td>Lowercase g</td></tr>
 +
<tr><td>104</td><td>150</td><td>68</td><td>01101000</td><td>h</td><td>&amp;#104;</td><td>&nbsp;</td><td>Lowercase h</td></tr>
 +
<tr><td>105</td><td>151</td><td>69</td><td>01101001</td><td>i</td><td>&amp;#105;</td><td>&nbsp;</td><td>Lowercase i</td></tr>
 +
<tr><td>106</td><td>152</td><td>6A</td><td>01101010</td><td>j</td><td>&amp;#106;</td><td>&nbsp;</td><td>Lowercase j</td></tr>
 +
<tr><td>107</td><td>153</td><td>6B</td><td>01101011</td><td>k</td><td>&amp;#107;</td><td>&nbsp;</td><td>Lowercase k</td></tr>
 +
<tr><td>108</td><td>154</td><td>6C</td><td>01101100</td><td>l</td><td>&amp;#108;</td><td>&nbsp;</td><td>Lowercase l</td></tr>
 +
<tr><td>109</td><td>155</td><td>6D</td><td>01101101</td><td>m</td><td>&amp;#109;</td><td>&nbsp;</td><td>Lowercase m</td></tr>
 +
<tr><td>110</td><td>156</td><td>6E</td><td>01101110</td><td>n</td><td>&amp;#110;</td><td>&nbsp;</td><td>Lowercase n</td></tr>
 +
<tr><td>111</td><td>157</td><td>6F</td><td>01101111</td><td>o</td><td>&amp;#111;</td><td>&nbsp;</td><td>Lowercase o</td></tr>
 +
<tr><td>112</td><td>160</td><td>70</td><td>01110000</td><td>p</td><td>&amp;#112;</td><td>&nbsp;</td><td>Lowercase p</td></tr>
 +
<tr><td>113</td><td>161</td><td>71</td><td>01110001</td><td>q</td><td>&amp;#113;</td><td>&nbsp;</td><td>Lowercase q</td></tr>
 +
<tr><td>114</td><td>162</td><td>72</td><td>01110010</td><td>r</td><td>&amp;#114;</td><td>&nbsp;</td><td>Lowercase r</td></tr>
 +
<tr><td>115</td><td>163</td><td>73</td><td>01110011</td><td>s</td><td>&amp;#115;</td><td>&nbsp;</td><td>Lowercase s</td></tr>
 +
<tr><td>116</td><td>164</td><td>74</td><td>01110100</td><td>t</td><td>&amp;#116;</td><td>&nbsp;</td><td>Lowercase t</td></tr>
 +
<tr><td>117</td><td>165</td><td>75</td><td>01110101</td><td>u</td><td>&amp;#117;</td><td>&nbsp;</td><td>Lowercase u</td></tr>
 +
<tr><td>118</td><td>166</td><td>76</td><td>01110110</td><td>v</td><td>&amp;#118;</td><td>&nbsp;</td><td>Lowercase v</td></tr>
 +
<tr><td>119</td><td>167</td><td>77</td><td>01110111</td><td>w</td><td>&amp;#119;</td><td>&nbsp;</td><td>Lowercase w</td></tr>
 +
<tr><td>120</td><td>170</td><td>78</td><td>01111000</td><td>x</td><td>&amp;#120;</td><td>&nbsp;</td><td>Lowercase x</td></tr>
 +
<tr><td>121</td><td>171</td><td>79</td><td>01111001</td><td>y</td><td>&amp;#121;</td><td>&nbsp;</td><td>Lowercase y</td></tr>
 +
<tr><td>122</td><td>172</td><td>7A</td><td>01111010</td><td>z</td><td>&amp;#122;</td><td>&nbsp;</td><td>Lowercase z</td></tr>
 +
<tr><td>123</td><td>173</td><td>7B</td><td>01111011</td><td>{</td><td>&amp;#123;</td><td>&amp;lcub;</td><td>Opening brace</td></tr>
 +
<tr><td>124</td><td>174</td><td>7C</td><td>01111100</td><td>|</td><td>&amp;#124;</td><td>&amp;verbar;</td><td>Vertical bar</td></tr>
 +
<tr><td>125</td><td>175</td><td>7D</td><td>01111101</td><td>}</td><td>&amp;#125;</td><td>&amp;rcub;</td><td>Closing brace</td></tr>
 +
<tr><td>126</td><td>176</td><td>7E</td><td>01111110</td><td>~</td><td>&amp;#126;</td><td>&amp;tilde;</td><td>Equivalency sign - tilde</td></tr>
 +
<tr><td>127</td><td>177</td><td>7F</td><td>01111111</td><td>DEL</td><td>&amp;#127;</td><td>&nbsp;</td><td>Delete</td></tr>
 +
</table>
 +
 
 +
===2 位元組編碼:===
 +
 
 +
===等價===
 +
 +
「◾◾◽◽◽◽◽◽,◾◽??????」與「◽◽??????」皆合法且等價
 
#https://graphemica.com/%C2%A1
 
#https://graphemica.com/%C2%A1
 
#https://www.ascii-code.com/
 
#https://www.ascii-code.com/
 
#http://jendo.org/study/showChar.html
 
#http://jendo.org/study/showChar.html
 
#http://jendo.org/study/seeDecode.php
 
#http://jendo.org/study/seeDecode.php

2023年8月9日 (三) 17:42的最新修訂版本

特殊字元

有 HTML 或 URL 意涵的字元

字元實體名稱字元編碼
不換行空格&nbsp;&#160;
<&lt;&#60;
>&gt;&#62;
"&quot;&#34;
'&apos;&#39;
&&amp;&#38;

碼位與字元

UTF-8編碼位元組含義:

  • ◽◾◾◾◾◾◾◾,對於UTF-8編碼中的任意位元組B,如果B的第一位為0,則B獨立的表示一個字元(ASCII碼);
  • ◾◽??????,128~191,如果B的第一位為1,第二位為0,則B為一個多位元組字元中的一個位元組(非ASCII字元);
識別位元固定為 128 ,表值 0~63。
  • ◾◾◽?????,192~223,如果B的前兩位為1,第三位為0,則B為兩個位元組表示的字元中的第一個位元組;
識別位元固定為 192 ,(0~31)×64。
  • ◾◾◾◽????,224~239,如果B的前三位為1,第四位為0,則B為三個位元組表示的字元中的第一個位元組;
  • ◾◾◾◾◽???,240~247,如果B的前四位為1,第五位為0,則B為四個位元組表示的字元中的第一個位元組;

組成字(以「瓦」的中文 3 byte 為例):

  1. 第一 byte 231, 231-224=7 ,單位 4096
  2. 第二 byte 147, 147-128=19 ,單位 64
  3. 第三 byte 166, 166-128=38 ,單位 1
字碼 = 7×4096+19×64+38=29926 ,字碼寫成 &#29926;

顯示編碼位的方法:

以 PHP 為例。

(一)由整數碼位生成對應的字符

  1. chr(整數碼位):產生單 byte 字元,如 ASCII、ISO-8859 或 Windows 1252。整數由 0~255 。無法通過傳遞一個 Unicode 碼位值來產生多位元組編碼 (像是 UTF-8 或 UTF-16) 字串。超過有效範圍 (0~255) 的值將用以下演算法處理:
    	while($bytevalue < 0){$bytevalue += 256;}
    	$bytevalue %= 256;
    運用多個chr(…)連接可以得到多 byte UTF-8 字符,如「echo chr(240).chr(159).chr(144).chr(152);」會得到「🐘」。
    碼位寬度有幾 byte 和顯示字型有多寬無關。
  2. mb_chr(整數碼位,'UTF-8'):在啟用了 mbstring 擴展的前提下,可以使用。即使是控制碼也生成。
  3. &#整數碼位;:
  4. html_entity_decode('&#整數碼位;', ENT_NOQUOTES, 'UTF-8')

(二)找出字符的碼位

  1. ord(字串):轉換字串第一個位元組為某個碼位 (0~255 之間的值)
  2. mb_ord(字串, '編碼'):獲取字串的第一個字元在某一種 Unicode 編碼中的碼位值。編碼可以是 UTF-8、ISO-8859-1、Windows-1252。
  3. htmlentities(字串[,flags,'字符集',是否轉換]):將字串中的特殊字元轉換為 HTML 轉義字元(&#…;)

1 位元組編碼:

0~31 控制字元

DEC OCT HEX BIN Symbol HTML Number HTML Name Description
00000000000000NUL&#00; Null character
10010100000001SOH&#01; Start of Heading
20020200000010STX&#02; Start of Text
30030300000011ETX&#03; End of Text
40040400000100EOT&#04; End of Transmission
50050500000101ENQ&#05; Enquiry
60060600000110ACK&#06; Acknowledge
70070700000111BEL&#07; Bell, Alert
80100800001000BS&#08; Backspace
90110900001001HT&#09; Horizontal Tab
100120A00001010LF&#10; Line Feed
110130B00001011VT&#11; Vertical Tabulation
120140C00001100FF&#12; Form Feed
130150D00001101CR&#13; Carriage Return
140160E00001110SO&#14; Shift Out
150170F00001111SI&#15; Shift In
160201000010000DLE&#16; Data Link Escape
170211100010001DC1&#17; Device Control One (XON)
180221200010010DC2&#18; Device Control Two
190231300010011DC3&#19; Device Control Three (XOFF)
200241400010100DC4&#20; Device Control Four
210251500010101NAK&#21; Negative Acknowledge
220261600010110SYN&#22; Synchronous Idle
230271700010111ETB&#23; End of Transmission Block
240301800011000CAN&#24; Cancel
250311900011001EM&#25; End of medium
260321A00011010SUB&#26; Substitute
270331B00011011ESC&#27; Escape
280341C00011100FS&#28; File Separator
290351D00011101GS&#29; Group Separator
300361E00011110RS&#30; Record Separator
310371F00011111US&#31; Unit Separator

32~127 顯示字元

DEC OCT HEX BIN Symbol HTML Number HTML Name Description
320402000100000SP&#32; Space
330412100100001!&#33;&excl;Exclamation mark
340422200100010'&#34;&quot;Double quotes (or speech marks)
350432300100011#&#35;&num;Number sign
360442400100100$&#36;&dollar;Dollar
370452500100101%&#37;&percnt;Per cent sign
380462600100110&&#38;&amp;Ampersand
390472700100111'&#39;&apos;Single quote
400502800101000(&#40;&lparen;Open parenthesis (or open bracket)
410512900101001)&#41;&rparen;Close parenthesis (or close bracket)
420522A00101010*&#42;&ast;Asterisk
430532B00101011+&#43;&plus;Plus
440542C00101100,&#44;&comma;Comma
450552D00101101-&#45; Hyphen-minus
460562E00101110.&#46;&period;Period, dot or full stop
470572F00101111/&#47;&sol;Slash or divide
4806030001100000&#48; Zero
4906131001100011&#49; One
5006232001100102&#50; Two
5106333001100113&#51; Three
5206434001101004&#52; Four
5306535001101015&#53; Five
5406636001101106&#54; Six
5506737001101117&#55; Seven
5607038001110008&#56; Eight
5707139001110019&#57; Nine
580723A00111010:&#58;&colon;Colon
590733B00111011;&#59;&semi;Semicolon
600743C00111100<&#60;&lt;Less than (or open angled bracket)
610753D00111101=&#61;&equals;Equals
620763E00111110>&#62;&gt;Greater than (or close angled bracket)
630773F00111111?&#63;&quest;Question mark
641004001000000@&#64;&commat;At sign
651014101000001A&#65; Uppercase A
661024201000010B&#66; Uppercase B
671034301000011C&#67; Uppercase C
681044401000100D&#68; Uppercase D
691054501000101E&#69; Uppercase E
701064601000110F&#70; Uppercase F
711074701000111G&#71; Uppercase G
721104801001000H&#72; Uppercase H
731114901001001I&#73; Uppercase I
741124A01001010J&#74; Uppercase J
751134B01001011K&#75; Uppercase K
761144C01001100L&#76; Uppercase L
771154D01001101M&#77; Uppercase M
781164E01001110N&#78; Uppercase N
791174F01001111O&#79; Uppercase O
801205001010000P&#80; Uppercase P
811215101010001Q&#81; Uppercase Q
821225201010010R&#82; Uppercase R
831235301010011S&#83; Uppercase S
841245401010100T&#84; Uppercase T
851255501010101U&#85; Uppercase U
861265601010110V&#86; Uppercase V
871275701010111W&#87; Uppercase W
881305801011000X&#88; Uppercase X
891315901011001Y&#89; Uppercase Y
901325A01011010Z&#90; Uppercase Z
911335B01011011[&#91;&lsqb;Opening bracket
921345C01011100\&#92;&bsol;Backslash
931355D01011101]&#93;&rsqb;Closing bracket
941365E01011110^&#94;&Hat;Caret - circumflex
951375F01011111_&#95;&lowbar;Underscore
961406001100000`&#96;&grave;Grave accent
971416101100001a&#97; Lowercase a
981426201100010b&#98; Lowercase b
991436301100011c&#99; Lowercase c
1001446401100100d&#100; Lowercase d
1011456501100101e&#101; Lowercase e
1021466601100110f&#102; Lowercase f
1031476701100111g&#103; Lowercase g
1041506801101000h&#104; Lowercase h
1051516901101001i&#105; Lowercase i
1061526A01101010j&#106; Lowercase j
1071536B01101011k&#107; Lowercase k
1081546C01101100l&#108; Lowercase l
1091556D01101101m&#109; Lowercase m
1101566E01101110n&#110; Lowercase n
1111576F01101111o&#111; Lowercase o
1121607001110000p&#112; Lowercase p
1131617101110001q&#113; Lowercase q
1141627201110010r&#114; Lowercase r
1151637301110011s&#115; Lowercase s
1161647401110100t&#116; Lowercase t
1171657501110101u&#117; Lowercase u
1181667601110110v&#118; Lowercase v
1191677701110111w&#119; Lowercase w
1201707801111000x&#120; Lowercase x
1211717901111001y&#121; Lowercase y
1221727A01111010z&#122; Lowercase z
1231737B01111011{&#123;&lcub;Opening brace
1241747C01111100|&#124;&verbar;Vertical bar
1251757D01111101}&#125;&rcub;Closing brace
1261767E01111110~&#126;&tilde;Equivalency sign - tilde
1271777F01111111DEL&#127; Delete

2 位元組編碼:

等價

: 「◾◾◽◽◽◽◽◽,◾◽??????」與「◽◽??????」皆合法且等價

  1. https://graphemica.com/%C2%A1
  2. https://www.ascii-code.com/
  3. http://jendo.org/study/showChar.html
  4. http://jendo.org/study/seeDecode.php