当前位置:网站首页>Feature Engineering -- numerical feature normalization
Feature Engineering -- numerical feature normalization
2022-07-20 16:14:00 【Shepherd ll】
Chapter one Feature Engineering
As the name suggests, feature engineering is the engineering processing of raw data , Refine it into characteristics , As input for algorithm and model implementation .
This chapter mainly discusses two common data types :
- Structured data . Structured data type can be regarded as a table of relational database , There is a clear definition , It includes numerical type , There are two basic types of category type , Each row of data represents the information of a sample
- Unstructured data . Unstructured sleep mainly includes text , Images , Sound evaluation , Video data , It contains information that cannot be represented by a simple number , There is no clear definition of categories , And the size of each data is different
01 Feature normalization
In order to eliminate the dimensional impact between data features , We need to normalize the features , Make different indicators comparable . for example , Analyze the impact of a person's height and weight on health , If you use rice m And kilogram kg Work unit , Then the height characteristics will be in 1.6-1.8m Within the numerical range of , Weight will be 50-100kg Within the scope of , The results of the analysis will tend to the weight characteristics with large numerical differences . Want to get more accurate demerit recording , We need to normalize the features (Normalization) Handle , Make each index in the same numerical magnitude , For analysis
1. Why do we need to normalize the characteristics of numerical types
Normalization of features such as logarithm can mean that all features are unified into a roughly the same numerical range .
There are two common methods
- Normalization of linear function Min-Max Scaling
It is more linear to the original data than that , Map the results to 【0,1】 The scope of the , Realize the scaling of the original data(x-xmin)/ (xmax-xmin)
- Zero mean normalization Z-Score Normalization
It will say that the original data is mapped to an average of 0, The standard deviation is 1 The distribution of , say concretely , Assume that the mean value of the original feature is u, The standard deviation is m :(x-u) / m
Why do we need to normalize numerical features ? It is advisable to illustrate the importance of normalization with the help of practical examples of random gradient descent , Suppose there are two numerical characteristics ,x1 Range greater than x2.
At the same learning rate ,x1 The update speed of will be greater than x2, It takes more iterations to find the optimal solution , If you will x1 and x2 After normalization to the same numerical range , The contour map of the optimization target will change from ellipse to circle ,x1 and x2 The update speed of becomes more consistent , It is easy and faster to find the optimal solution through gradient descent
Of course , Data normalization is not everything , in application , The model solved by gradient descent method usually needs normalization , Including linear regression , Logical regression , Vector machine only , Neural networks and other models , But it is not suitable for decision tree model , The node classification of decision tree is mainly based on the data set D About the characteristics of x The information gain ratio of , And the information gain ratio has nothing to do with whether the feature is normalized , Because normalization does not change the characteristics of the sample x Information gain on
边栏推荐
猜你喜欢
随机推荐
DS(LineLinkStorStruct)
What does the server white list mean
吃透Chisel语言.18.Chisel模块详解(五)——Chisel中使用Verilog模块
Google asked Indian taggers to tag reddit comment data sets, with an error rate of up to 30%?
Don't understand MySQL database? Alibaba P8 architects will show you MySQL and Optimization in simple terms
Mysql database deletion failed
The current situation of the industry is disappointing. After working, I returned to UC Berkeley to study for a doctoral degree
What is the difference between shallow copy and deep copy?
vim中单词操作方法总结
Era journey of operators: plant 5.5G magic beans and climb the Digital Sky Garden
Obtain epidemic information and data
参与开源社区还有证书拿?
DS(LinTabSeqStorStruct)
Using GPU to discover human brain connections, large-scale GPU achieves a 100 fold acceleration
dns劫持是什麼意思?常見的劫持有哪些?
Go printf how to format output, structure format output, one-stop solution to all troubles
深入了解JUC並發(八)線程池
DS(ArrayStructure)
What does DNS hijacking mean? What are the common hijackings?
力士乐比例节流阀2WRCE80D001-1X/PG24/M-120