当前位置：网站首页>Machine learning - Integrated Learning

Machine learning - Integrated Learning

2022-07-22 13:45:00 【InfoQ】

1. What is integrated learning

The goal of supervised machine learning is to train a model that is stable in all aspects , However, due to the lack of perfect robustness of the algorithm and data itself , The model is not necessarily ideal .

The popular expression of integrated learning can be summarized in one sentence ： The Three Stooges , To be a Zhugeliang . Ensemble learning builds multiple models with preferences （ Weak model ）, To combine and build strong models . The potential idea of ensemble learning is that even if a weak classifier gets the wrong prediction , Other weak classifiers can also correct errors .

There are mainly two kinds of integrated learning ：Bagging,Boosting.

Bagging by 【
Put back the sample 】
Method to create a training set , To build multiple classifiers separately . Predict new data through the voting of multiple classifiers . The typical algorithm is random forest .

Boosting by 【
How to improve
】, That is, constantly reduce the deviation of supervised learning to build a strong classifier , The more we go to the following weak classification, the more we pay attention to the samples where the previous weak classifier fails . Typical algorithms are AdaBoost and GBDT（ Gradient iterative decision tree ）

Bagging and Boosting Sampling is used - Study - The way of combination , But there are some differences in details , Such as

Bagging Each training set in is not related to each other , That is, each base classifier is not related to each other , and Boosting The training set should be adjusted on the results of the previous round , It also makes it impossible to calculate in parallel

Bagging The prediction function is uniform and equal , But in Boosting The prediction function is weighted

Machine learning textbook from Mr. Zhou Zhihua （  machine learning - zhou ）：Boosting The main focus is on reducing bias , therefore Boosting It can build strong integration based on learners with weak generalization performance ;Bagging The main focus is on reducing variance , So it's not pruning the decision tree 、 The effectiveness of neural network is more obvious .

null

The more complex the model , The smaller the deviation （ The more the predicted value fits the real value of the training set ）, At this point, if you change a training set , The model changes a lot （ The variance is large ）, Great impact on data disturbance .

Classical ensemble learning algorithms mainly include

Random forests

AdaBoost

GBDT（ Gradient iterative decision tree ）

2. Random forests

Through the row of data （ Samples are randomly placed back for sampling ） And column sampling （ Attribute back to sampling ） To train multiple decision trees （ No pruning ） Composed of multiple classifiers （ The forest ） To identify the voting decisions of new samples

（1） Every decision tree is an expert who is proficient in a narrow field （ Because we get M individual feature Choose from m Let every decision tree learn ）, In this way, there are many experts who are proficient in different fields in random forest ,

（2） On a new question （ New input data ）, You can look at it from a different perspective , It's up to the experts , The result of the vote

（3） The key parameters ： n_estimators Number of decision trees , max_features The number of randomly selected features

advantage ：

> Good performance on datasets
> On many current data sets , It has great advantages over other algorithms
> It can handle very high dimensions （feature quite a lot ） The data of , And you don't have to do feature selection
> After training , What can it give us feature More important
> When creating random forests , Yes generlization error It uses unbiased estimation
> Fast training
> In the process of training , Can detect feature Mutual influence
> Easy to parallel
> The implementation is relatively simple

null

X = iris.data

y = iris.target

from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(max_depth=5, n_estimators=10, max_features=1)

model.fit(X,y)

model.score(X, y)

print(model.predict([X[1,],X[2,]]))

Important output indicators ： oob_score_： [out of  bagger score] 
oob The samples are the remaining samples other than the training samples of each tree
1） For each sample , Calculate it as oob The tree of the sample classifies it （ about 1/3 The tree of ）;
2） Then a simple majority vote is taken as the classification result of the sample ;
3） Finally, the ratio of the number of misclassification to the total number of samples is used as the random forest oob The error rate is oob_score_.

2.Adaboost

Improve the weight of samples that are incorrectly classified by the previous weak classifier , And reduce the weight of which samples are correctly classified . Samples without correct classification will get greater attention from the next round of weak classifiers

Train multiple weak classifications , Conduct weighted voting . Increase the weight of weak classifier with small classification error rate , Reduce the voting weight of weak classifiers with large classification error rate

Every time the basic classification model is based on the previous classification model

The final model is a weighted and detailed combination of many learning models , The smaller the classification error rate, the greater the weight of the basic classifier , The greater the voting power .

Adaboost Each basic classifier optimizes error classification step by step . The basic classifier can be a simple weak classifier , If it is a decision tree , The hierarchy of trees can be few .

The key parameters ：n_estimators The number of classifiers ,
base_estimator
The basic classifier is the decision tree classifier ,max_depth=1 The weak classifiers of

from sklearn.ensemble&nbsp; import AdaBoostClassifier

X = iris.data

y = iris.target

model=AdaBoostClassifier(n_estimators=20)

model.fit(X,y)

model.score(X, y)

print(model.predict([X[100,],X[2,]]))

3.GBDT（ Gradient iterative decision tree ） Algorithm

GBDT By using the addition model （ That is, the linear combination of basis functions ）, And the algorithm of continuously reducing the residual generated in the training process to achieve data classification or regression .

Every tree studies the residuals of the sum of all previous tree conclusions , The residual is a cumulative quantity that can be obtained by adding the predicted value to the real value .

GBDT and Adaboost The difference is , The object of each decision tree is the residual of the previous decision tree model .

  Each calculation is to reduce the residual of the last time (residual),, And in order to eliminate the residuals , We can reduce the gradient of the residual (Gradient) Build a new model in the direction ; In this way, the tree behind can focus more and more on those wrongly divided in front instance

The key parameters ：n_estimators The number of iterations （ Number of decision trees ）

null

from sklearn.ensemble import GradientBoostingClassifier

X = iris.data

y = iris.target

model=GradientBoostingClassifier(n_estimators=100, learning_rate=1.0, max_depth=1,random_state=0)

model.fit(X,y)

model.score(X, y)

print(model.predict([X[100,],X[2,]]))

原网站

版权声明
本文为[InfoQ]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/203/202207212237176447.html