当前位置:网站首页>Watermelon book chapter 2 Notes - evaluation method
Watermelon book chapter 2 Notes - evaluation method
2022-07-21 22:41:00 【kurok_】
Basic indicators
Error rate : The proportion of misjudgment in the test sample
precision :1- Error rate
Division method of training set and test set :
1. Set aside method
Aggregate data set D= Training set S+ Test set T
Traditional three seven open and so on , Three test sets , 70% training set . Generally, stratified sampling is also used , Let the proportion of positive and negative samples in the test set and the training set be the same .
The single set aside method is often not reliable , It usually happens many times at random .
2. Cross validation
k Fold cross validation is about all samples D Divide into k Share , Then choose one of them k-1 As a training set , The remaining 1 As a test set , And repeat this process k Time ( Each test set is different ).
When dividing, you can also choose a variety of division methods , commonly k Fold cross validation usually needs to be repeated randomly using different division methods p Time , This is called p Time k Crossover verification .
If there are only m individual , And then take k=m, This is called Keep one , In this way, it will not be affected by the random sample division , Because the training set is relatively close D, So the effect is often more accurate , But it takes a lot of time .
3. Self help law
Neither of the above two methods can be fully utilized D Training ( It takes too much time to leave one method ), To solve this problem , Propose self-help method (bootstrapping).
Given contains m individual The data set of the sample D, There is a sample put back m Time , Every time 1 Samples ,( Note the total number of samples and data sets D The total number of samples is the same !!!), In this way, we can get D',D' As a training set ,D\D' As test set , Such test results are also called “ Out of the bag estimate ”.
The self-help method is smaller in the data set 、 It's hard to divide training effectively / It's useful when testing sets . And it can generate many different training sets from the initial data set , Better for integrated learning .
But there are drawbacks , Changed the distribution of the initial dataset , It introduces estimation bias , So it's better to set aside and cross verify when the data set is sufficient .
边栏推荐
- Cookie快速入门
- VisualStudio2019 配置点云库 PCL1.11.1+斯坦福兔子测试
- Paoding solves the fiboracci sequence and knapsack problem - analyze the optimization process of the two problems in detail, and take you to understand dynamic programming from the most basic problem!
- What is PCBA? What is the importance of PCBA testing?
- XML详解
- Let you know the current situation and future development trend of wireless charging technology
- 2022ACM夏季集训周报(三)
- 230. 二叉搜索树中第K小的元素
- 19. Delete the penultimate node of the linked list
- 994. 腐烂的橘子
猜你喜欢
567. 字符串的排列
Let you know the current situation and future development trend of wireless charging technology
438. Find all letter ectopic words in the string
438. 找到字符串中所有字母异位词
1046. Weight of the last stone
01 knapsack interview questions series (I)
堪比“神仙打架”的开源数据可视化社群,你见过吗?
Paoding solves the fiboracci sequence and knapsack problem - analyze the optimization process of the two problems in detail, and take you to understand dynamic programming from the most basic problem!
西瓜书第三章-线性模型
Expression evaluation
随机推荐
06 page object + pytest unit test framework
企业如何做好数据管理?产品选型怎么做?
844. Compare strings with backspace
004: print characters
Using UUID as MySQL primary key, my boss broke up
299. Guessing numbers game
D - AND and SUM (AtCoder Beginner Contest 238)
016:简单计算器
Comprehensive: realtek/ Ruiyu wireless product map and market composition
How to complete the design of RGB lamp Bluetooth mesh module from 0 to 1
VisualStudio2019 配置点云库 PCL1.11.1+斯坦福兔子测试
西瓜书第二章笔记-评估方法
Open source data visualization tool datart new user experience tutorial for popular communities
What is the Internet of things control system? What are its characteristics?
230. The k-th smallest element in the binary search tree
What is a direct drinking machine? What is its working principle and advantages?
2021 popularization group summary
Characteristics and differences between PCB and integrated circuit
流批一体?实时数据处理场景化应用实例~
14. Longest common prefix