当前位置:网站首页>Compiler division optimization
Compiler division optimization
2022-07-20 08:01:00 【Sili who can't write code】
We know that division in modern times CPU More clock cycles are consumed in the calculation :
The following figure Add and div The comparison
Div Delay 60-80 Between and add Only 1, It can be seen that we should avoid calling the division assembly instruction in the real situation .
terms of settlement :
For dividing by 2 Multiple optimization of
For a that is to be divided and the divisor is 2 Multiple , We can accomplish it by displacement
Here's the picture :
int x = 58;
// x = x%8
x >>= 3;
But for negative numbers, displacement may cause errors as follows :
For negative numbers, we can use the following formula to convert
for instance :
a = -58
b = 8
In this example, we conform to the following formula in combination with the characteristics of the computer ( For example, the result is -7.25 We should take it up instead of down ):
ceil ((a+b-1)/b) =ceil ((-58+8-1)/8) =ceil(-6.375)=-7
So our division optimization can eliminate division in this way
int x = -58;
// If the divisor is less than 0 Use formulas to eliminate division
if (x < 0)
{
//x = (x + 8 - 1) / 8; => (x + 7) >> 3;
// The computer defaults to rounding down
x = (x + 7) >> 3;
}
else {
x >>= 3;
}
Let's look at the following code :
#include<stdio.h>
int main(int argc,char* args[])
{
printf("%d\r\n",argc/8);
return 0;
}
After compilation, it will become the following assembly instructions
mov eax,dword ptr[argc]
Move to register cdq
hold eax Expand to 64 The high bit moves to edx
On . Here is to deal with the situation of positive and negative numbers and edx ,7
If it's a positive number edx Than for 0, If it's negative edx All for 1. Therefore, after the positive number is executed edx by 0, A negative number is 7.add eax,edx
Want to be with us (x + 7)
, But the back 7 May be 0sar eax,3
eax Move arithmetic right 3 Bit guarantee
Divide by non 2 Multiple optimization
We can use the following formula
The inverse formula is as follows :
tip: M Not divisible because 2^n It's even and c Right and wrong 2 Multiple , So the result is rounded down by default , This conclusion needs attention in the case of negative numbers
among M Is a constant, so it can be optimized by the compiler into a compilation constant , among n At least for 32(n The larger the, the more accurate the result )
Examples are as follows :
Signed Division
int main(unsigned int argc, char* args[])
{
// Note that the result of unsigned division is unsigned 、、argc It's an unsigned number
printf("y ===>>> %d\r\n", argc / 3);
return 0;
}
Corresponding assembly statement :
among 0AAAAAAABh
Yes, in our formula M
; eax = M
mov eax,0AAAAAAABh
; among edx High storage 32 position eax It's low 32 position ret Mark as our calculation result .argc by a So I got
; (edx,eax) = ret = eax * argc = aM
mul eax,dword ptr [argc]
;edx Because it is high 32, Moving one bit is equivalent to ret Move 33 position , That is to say ret>>33 Equivalent edx=edx/(2^33)
shr edx,1
We use the formula to get the divisor
As you can see, there is no division instruction in the whole formula above .
The exception is tough IDA pro It can help us quickly identify
Signed Division
#include<stdio.h>
int main( int argc, char* args[])
{
// Be careful argc It's symbolic
printf("y ===>>> %d\r\n", argc / 3);
return 0;
}
Compared with unsigned division, there are more following compilations
0040104B mov eax,edx
;1F To 10 Base number 31
0040104D shr eax,1Fh
00401050 add eax,edx
The reason why there are a few more lines of assembly is to avoid the problem of rounding down by default in negative division , So when it is negative, you need to add 1.( Refer to above )
; take a*m The high order of the product is eax
0040104B mov eax,edx
;1F To 10 Base number 31
; Move right 31 After that, only the sign bit is left, that is 1 perhaps 0
; A negative number is 1 A positive number is 0
0040104D shr eax,1Fh
; If it's a negative number eax by 1 Then add a place
00401050 add eax,edx
M Greater than 32 Deformation of digits
#include<stdio.h>
int main(unsigned int argc, char* args[])
{
// Be careful argc It's symbolic
printf("y ===>>> %d\r\n", argc / 7);
return 0;
}
The above compilation is converted into mathematical formula as follows :
among M by 24924925h, And our original real formula M by 2^32+M
Obviously, this number 32 register
Not enough to store , And divide by 2^35 It's the same thing
Multiplication optimization
Suppose we can only use 16 The bit multiplier performs the following calculation :
seek A*8086h Result (8086h Stored in ax in ,A Stored in cx)
among 8086h It's an unsigned number ,A It's user input word The size value is a signed number .
So if you directly use mul cx; Words Will be able to cx As an unsigned number , But it is possible that the number entered by the user is negative
imul cx; Will make ax As a negative number , But the actual ax Is an unsigned number .
Let's first look at the following formula :
;x It's negative word size
x + ~x = ffffh
x+ ~x + 1 =10000h
// This is ~x+1 Namely x Complement
~x+1=10000h-x
x = 10000h - (~x+1)
; So
;(~x+1) Namely 8086h Binary conversion to 10 Into the system for -32634 .(10000h-8086h) be equal to 8086 In binary, a positive number of symbols indicates 10000h-8086h= 32634
; And the minus sign in front - (10000h-8086h) Used to convert to the correct number ( because 8086h stay 16 position imul It's a negative number )
dx.ax = A * - (10000h-8086h)
dx.ax = A * (8086h-10000h)
dx.ax = 8086Ah-10000Ah
dx.ax+10000A = 8086h*A
(dx+A).ax = 8086h*A
Through the above formula, we can get the following assembly code to calculate the problem
mov ax,8086h
mov cx,A
imul cx;
add dx,cx
If you pass in A It's a negative number, so just put add dx,cx Turn into sub dx,cx
Suppose we want to calculate *-8086h A The numerical We can only use 16 Bit register calculation , however -8086h Has exceeded 16 precision
because
-x = (~x+1)-10000h
therefore :
dx.ax = A * -(10000h-8086h)
dx.ax = 8086Ah-10000h
A0000h+dx.ax = 8086h*A
; Multiply both sides by minus one
(dx-A).ax = -8086h*A
mov ax,8086h
mov cx,A
imul cx;
sub dx,cx
Negative optimization of division
int main( int argc, char* args[]){
printf("y ===>>> %d\r\n", argc / -5);
return 0;
}
Let's first review the division optimization formula above
When c When it's negative :
So in this case 99999999h It's a negative number , And the current binary is a complement code .
The corresponding decimal number is -1717986919, So when we need to reverse the divisor, we use 10 Hexadecimal number can be calculated
c = 2^33 / -1717986919=-4.999999998 The resulting -5
Negative optimization of division 2
int main( int argc, char* args[])
{
printf("y ===>>> %d\r\n", argc / -7);
return 0;
}
What the compiler actually wants to express here M Not at all 6DB6DB6D The simultaneous number actually represents a negative number ,6DB6DB6D Greater than 4 Byte signed storage range That's why the above assembly code
set up 6DB6DB6D by Y,A For divisor
because
-x = (~x+1)-100000000h
therefore :
; because M Less than 0 So the minus sign
dx.ax = A * (100000000h-Y)
therefore M = - (100000000h-Y)
M =- (100000000h-6DB6DB6DH) =-92492493h
When I find M Then you can convert the divisor c= 2^34 / - 92492493h =- 7
边栏推荐
- 为什么IT行业还是那么多人的首选?
- MaixHub在线训练初体验
- This week's big news | Elbit pushes the exclusive ar helmet for pilots, and Apple's second generation Mr will be divided into high and low configurations
- 第五周复习
- 21特征值和特征向量
- Flask框架——模型关系(1对多)
- 编译器求余优化
- CSimpleArray
- Comparative analysis of single sign on SSO of JWT, CAS, oauth2 and SAML
- Meta CTO:Cambria首发配手柄,新品保持12-24个月迭代
猜你喜欢
execve 执行遇到的问题-已解决
SQL SERVER 发送邮件失败 提示必须制定收件人
Finite element method for seepage problems in geotechnical engineering: theory, modular programming implementation, hands-on application of open source programs
Tkinter module GUI Graphical programming practice (VIII) -- Chinese chess (including super detailed and complete source code, free download link of complete program)
Rllib学习[2] --env定义 + env rollout
上海域格4G模块PPP拨号相关问题
编译器求余优化
What are the main aspects of interface testing? What skills are needed? How to learn?
Mecol Studio - Huawei 14 day Hongmeng equipment development practical notes II
SpiderPi便捷操作手册
随机推荐
HCIP --- 重发布
本周大新闻|Elbit推飞行员专属AR头盔,苹果第二代MR将分高低配
Hc-sr04 ultrasonic module driven by 51 single chip microcomputer (LCD1602 display)
AR美妆平台YouCam支持男性胡须实时预览
Mecol Studio - Huawei 14 day Hongmeng equipment development practical notes 3
C# 基础(二)
TensorFlow 2 详解(tf生态系统、安装、内务管理、基本操作)
Watch for free: video courses on technology application based on Remote Sensing (deep learning, gee, hyperspectral, long time series, UAV, etc.)
C语言编程题:(C语言)分糖果 通俗易懂
new投票
Cadence学习之路(七)PCB创建与网表导入
A decorrelation sub domain adaptive framework based on weak pseudo supervision for cross domain land use classification
一篇文章掌握Mysql
SQL 时间拼接问题,系统自动截断的拼接复原
Kuzaobao: summary of Web3 encryption industry news on July 18
selnium 获取js内容
Authentication token manipulation error problem solving
Don't know how to learn MySQL? It's enough to finish the 50 questions of Niuke! (Part II)
为什么IT行业还是那么多人的首选?
Codeforces 429E 2-SAT