当前位置:网站首页>ES 自定义分析器
ES 自定义分析器
2022-07-21 02:26:00 【文晓武】
分析器定义
一个分析器是组合了三种函数的包装器,三个函数按以下顺序执行:
- 字符过滤器 处理初始输入的字符串,可以过滤掉特定的字符,或者转换成自定义的字符
- 分词器 分词器把字符串分成单个词条,一个分析器必须有一个唯一的分词器
- 词单元过滤器 词条按顺序通过词单元过滤器,词单元过滤器可以修改,添加或者移除词条。例如:lowercase过滤器,会把字母全部转换为小写。
自定义分析器
在 analysis 下的相应位置设置字符过滤器、分词器和词单元过滤器,格式为:
PUT /my_index
{
"settings": {
"analysis": {
"char_filter": { ... 字符过滤器 ... },
"tokenizer": { ... 分词器 ... },
"filter": { ... 词单元过滤器 ... },
"analyzer": { ... 分析器 ... }
}
}
}
实际应用中的定义:
{
"analysis": {
"tokenizer": {
"filter": {
"my_stopwords": {
"type": "stop",
"stopwords": [
"the",
"a"
]
}
},
"trigram_tokenizer": {
"type": "ngram",
"min_gram": 1,
"max_gram": 3,
"token_chars": [
"letter",
"digit"
]
}
},
"analyzer": {
"trigram_analyzer": {
"tokenizer": "trigram_tokenizer",
"filter": [
"lowercase",“my_stopwords”
]
}
}
}
}
定义一个分析器trigram_analyzer,采用分词器tokenizer,采用ngram作为此单元过滤器,字符过滤器只保留字母letter 和数字digit,并转换为小写字母lowercase,去掉the 和a
边栏推荐
- 零基础学习CANoe Panel(1)—— 新建 Panel
- Splicing of SRC variables in wechat applet pictures
- 有趣的 Kotlin 0x0D:IntArray vs Array<Int>
- 第一部分—C语言基础篇_7. 指针
- 数字孪生社区管理系统,九大应用场景建设
- Involution: Inverting the Inherence of Convolution for Visual Recognition(CVPR2021)
- Control in canoe panel: switch/indicator
- How much has changed from new retail to community group purchase?
- Okaleido tiger NFT即将登录Binance NFT平台,NFT权益时代即将开启
- After leaving a foreign company, I know what respect and compliance are
猜你喜欢
DP --- knapsack problem
Introduction to C language --- operators
读秀数据库的用法+全国图书馆参考咨询联盟
The risk control data (model) is not bothered because of the processing skills in this flow process
[translation] technical writing of developers
In the post deep learning era, where is the recommendation system going?
手工遍历二叉树的技巧
STM32 DHT11温湿度传感器模块学习总结
MySQL的MVCC详细理解(2022版)
自定义分页标签
随机推荐
高质量WordPress下载站模板5play主题源码
触发器基础知识(上)
[translation] principles for designing and deploying extensible applications on kubernetes
测试必知必会的Mock数据方法
哈夫曼树与哈夫曼编码的考点
Resttemplate calls post\get
Digital twin technology offshore wind farm solution
一种非极大值抑制(non_max_suppression, nms)的代码实现方式
银行不是唐僧肉 银行是金融安全的坚实屏障。
ESB combined with UMC cloud platform development instructions
what? Does the multi merchant system not adapt to app? This is coming!
Test the mock data method of knowing and knowing
ORACLE导出表数据-dmp文件
数字孪生应用案例及常用技术
Skiasharp's WPF self drawn bouncing ball (case version)
Digital twin technology creates a visualization solution for smart mines
王者荣耀商城异地多活架构设计
风控数据(模型)不扎心,只因好这道流水工序上的处理技巧
苹果公司发布watchOS 8.7 包含错误修复和安全更新
canal. Deployer-1.1.6 encountered dump address has an error, retrying Caused by problem