当前位置:网站首页>Pattern matching: The gestalt approach一种序列的文本相似度方法
Pattern matching: The gestalt approach一种序列的文本相似度方法
2020-11-06 01:28:00 【Elementary school students in IT field】
Reprint please indicate original :https://blog.csdn.net/HHTNAN
Pattern matching: The gestalt approach
python Compare the similarity of two sequences , There is no need for a participle
Case study 1
import difflib
a=" Do uterine fibroids minimally invasive surgery with how much money "
b=" What does tinea cruris look like ? How to treat tinea cruris good ?"
print (difflib.SequenceMatcher(None,a,b).ratio())
Output :
0.06666666666666667
Case study 2
import difflib
a=" Do uterine fibroids minimally invasive surgery with how much money "
b=" Do uterine fibroids minimally invasive surgery specific costs "
print (difflib.SequenceMatcher(None,a,b).ratio())
Output :
0.769230769
Case study 3
import difflib
a=" Do uterine fibroids minimally invasive surgery with how much money "
b=" Specific cost to do uterine fibroids minimally invasive surgery "
print (difflib.SequenceMatcher(None,a,b).ratio())
Output :
0.6923076923076923
Case study 4
import difflib
a=" Do uterine fibroids minimally invasive surgery with how much money "
b=" Specific cost of uterine fibroids to do minimally invasive surgery "
print (difflib.SequenceMatcher(None,a,b).ratio())
0.6153846153846154
Through the above case, we can see that the algorithm focuses on , It's sequence similarity . Will ignore the meaning of the subject 、 semantics .
The score returned by the algorithm is twice the number of sequence characters found by the algorithm divided by the total number of characters in two strings ; The score is returned as an integer , Reflect percentage match .
At present, the calculation formula of guessing algorithm is ,
If the positions in the sequence don't exactly match , Such as the case 3, Then the calculated score is 9/13,9 For the largest common string ,13 Is the total number of character sequences , Case study 4 by 8/13 Result , Understood as a 4+4/13 Result . So the question is why the case 2 The largest of 9 The score for the largest common string is so high , There should be a consistent score in one position +1. That is, the result is understood as 9+1/13 The result . The above conjectures are based on the test , It's not validated , It's not authoritative , I'll find the paper and read it later , Finishing again .( It is worth noting that in the process of re-engineering, it is to B On the basis of characters .)
Case study 5
import difflib
a=“10 Anemia in a month old baby ”
b=“10 A month old baby has nosebleed ”
print (difflib.SequenceMatcher(None,a,b).ratio())
Output
0.8235294117647058
(7+8)+1/len(a)+len(b)=7*2/8+9=0.8235294117647058
Reprint please indicate original :https://blog.csdn.net/HHTNAN

版权声明
本文为[Elementary school students in IT field]所创,转载请带上原文链接,感谢
边栏推荐
- Analysis of react high order components
- From zero learning artificial intelligence, open the road of career planning!
- ES6 essence:
- [event center azure event hub] interpretation of error information found in event hub logs
- Natural language processing - BM25 commonly used in search
- The data of pandas was scrambled and the training machine and testing machine set were selected
- 全球疫情加速互联网企业转型,区块链会是解药吗?
- H5 makes its own video player (JS Part 2)
- Not long after graduation, he earned 20000 yuan from private work!
- 阿里云Q2营收破纪录背后,云的打开方式正在重塑
猜你喜欢
Brief introduction and advantages and disadvantages of deepwalk model
阿里云Q2营收破纪录背后,云的打开方式正在重塑
至联云分享:IPFS/Filecoin值不值得投资?
Python基础变量类型——List浅析
一篇文章带你了解CSS3圆角知识
From zero learning artificial intelligence, open the road of career planning!
PN8162 20W PD快充芯片,PD快充充电器方案
Natural language processing - BM25 commonly used in search
This article will introduce you to jest unit test
Vue 3 responsive Foundation
随机推荐
What is the side effect free method? How to name it? - Mario
Do not understand UML class diagram? Take a look at this edition of rural love class diagram, a learn!
With the advent of tensorflow 2.0, can pytoch still shake the status of big brother?
If PPT is drawn like this, can the defense of work report be passed?
Electron application uses electronic builder and electronic updater to realize automatic update
Mongodb (from 0 to 1), 11 days mongodb primary to intermediate advanced secret
Common algorithm interview has been out! Machine learning algorithm interview - KDnuggets
ES6 essence:
In order to save money, I learned PHP in one day!
How to become a data scientist? - kdnuggets
Skywalking series blog 1 - install stand-alone skywalking
5.4 static resource mapping
使用 Iceberg on Kubernetes 打造新一代云原生数据湖
一篇文章带你了解SVG 渐变知识
2019年的一个小目标,成为csdn的博客专家,纪念一下
React design pattern: in depth understanding of react & Redux principle
至联云分享:IPFS/Filecoin值不值得投资?
Keyboard entry lottery random draw
How to encapsulate distributed locks more elegantly
Summary of common algorithms of linked list