当前位置:网站首页>Character encoding problem
Character encoding problem
2022-07-22 16:25:00 【Every day without dancing is a betrayal of life】
1. What is? Unicode
- Unicode yes
Character set
- It assigns each character a unique ID, That is, all characters in the world are assigned a unique ID
- in the light of Unicode There are many coding rules implemented by the character set of , Such as utf-8,utf-16,utf-32
- such as Unicode The character set only specifies
you
The code number of is 12345, But the specific encoding takes up a few bytes , It depends on the coding rule you choose .
2. What is? ANSI
- ANSI It's also
Character set
- ASCII The code assigns unique characters to English letters ID, front 128 English characters 、 Numbers 、 Common symbols .
- ANSI It's right ASCII An extension of . front 128 Characters also represent English characters 、 Numbers 、 Universal character . The following code means
A country
All characters of . - such as , For China ANSI Encoded as GB2312. For Japan ANSI Encoded as Shift_JIS, That is, each country has its own standards .
- The biggest drawback is the difference between different languages ANSI Codes cannot be converted to each other , It will cause garbled code in multi language mixed text .
3. What is? UTF-8
- UTF-8 yes
Encoding rules
, It's right Unicode Implementation of character set coding . - It's a set of 8 Bit by bit
Coding unit
Variable length coding of . A code point will be encoded as 1 To 4 Bytes . - English characters 、 Numbers 、 Common characters account for 1 Bytes
- Most Chinese characters account for 3 Bytes , A few rarely used Chinese characters account for 4 Bytes
- Characters of single subsection , The first bit of the byte is set to 0, For English texts ,UTF-8 The code takes only one byte , and ASCII It's the same size ;
- n Characters in bytes (1<n<=4), Before the first byte n Set as 1, The first n+1 Set as 0, The first two bits of the next byte are set to 10, this n The rest of the bytes fill in the character unicode code , High level 0 Make up
0xxxxxxx
110xxxxx 10xxxxxx
1110xxxx 10xxxxxx 10xxxxxx
11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
4. What is? utf-16、utf-32
- utf-16 A code point will be encoded as 2 Bytes or 4 Bytes
- utf-32 A code point will be uniformly encoded as 4 Subsection
- You will find that the utilization of these two coding methods is not as good as utf-8, But the coding method is simpler .
边栏推荐
猜你喜欢
嵌套子查询
JVM classic interview 20 questions
Android面试:2022请收好这份网易Android开发和抖音电商Android工程师的面经
第七讲 管道、环境变量与常用命令
Lesson 3 shell syntax
A 15-year-old ABAP veteran's suggestion: understanding these basic knowledge is beneficial to ABAP development
【Unity项目实践】游戏架构
C#服务器NFS共享文件夹搭建与上传图片文件
Chant Developer Workbench 2022
SOC key control LED
随机推荐
分布式调度问题
智齿提供的横向撕咬功能
Cross domain request of SAP e-commerce cloud Spartacus UI customer system
Preloading and lazy loading of DOM
计算机网络之DNS面试题
美化多位数字
嵌套子查询
[elaborate] ES6's array expansion method, object expansion method, string expansion method Object level depth
Session共享问题
PXE网络装机
Mnemonic search
Roson的Qt之旅#98-QML标签页控件TabView
Simplified writing of not like in MySQL
记一个composer依赖问题requires composer-runtime-api ^2.0.0 -> no matching package found
P6327 interval plus interval sin sum
String和char[]互转的思考
下班前几分钟,Express 快速入门
mysql通过开启全局日志进行定位排查慢sql
C# 上传图片至共享文件夹
【解决方案】解决ImportError: Library “GLU“ not found.问题