当前位置:网站首页>NLP model Bert: from introduction to mastery (2)
NLP model Bert: from introduction to mastery (2)
2020-11-06 01:22:00 【Elementary school students in IT field】
Named entity recognition
First download the corresponding bert modular
pip install bert-base==0.0.9 -i https://pypi.python.org/simple
Also can reference Official website Handle
install
What the package now supports
1. Named entity recognition training
2. Services for Named Entity Recognition C/S
3. Inherit excellent open source software :bert_as_service(hanxiao) Of BERT All services
4. Text categorization Services
The following functions will continue to increase
Training named entity recognition model based on named row :
installed bert-base after , Two tools based on named rows will be generated , among bert-base-ner-train Support the training of named entity recognition model , You just need to specify the directory of training data ,BERT The directory of relevant parameters can be . You can use the following command to view help
The examples of training are named as follows :
bert-base-ner-train \
-data_dir {your dataset dir}\
-output_dir {training output dir}\
-init_checkpoint {Google BERT model dir}\
-bert_config_file {bert_config.json under the Google BERT model dir} \
-vocab_file {vocab.txt under the Google BERT model dir}
Parameter description
among data_dir It's the directory where your data is located , Training data , The naming format of validation data and test data is :train.txt, dev.txt,test.txt, Please name the file in this format , Otherwise, an error will be reported .
The format of training data is as follows :
The sea O
fishing O
Than O
" O
The earth O
spot O
stay O
mansion B-LOC
door I-LOC
And O
gold B-LOC
door I-LOC
And O
between O
Of O
The sea O
Domain O
. O
The first word in each line is , The second is its label , Use spaces ’ ' Separate , Please make sure to use spaces . Use blank lines between sentences . The program will automatically read your data .
output_dir: Training model output file path , Model checkpoint And some tag mapping tables will be stored here , This path is used as a service , Can be specified as -ner_model_dir
init_checkpoint: Download Google BERT Model
bert_config_file : Google BERT Under the model bert_config.json
vocab_file: Google BERT Under the model vocab.txt
After training , You can specify in your output_dir To see the results of your training .
More operations :
https://blog.csdn.net/macanv/article/details/85684284
One more bert Encapsulation of models
https://www.jianshu.com/p/1d6689851622
https://cloud.tencent.com/developer/article/1470051
https://www.h3399.cn/201908/714454.html

版权声明
本文为[Elementary school students in IT field]所创,转载请带上原文链接,感谢
边栏推荐
- Filecoin最新动态 完成重大升级 已实现四大项目进展!
- The practice of the architecture of Internet public opinion system
- Did you blog today?
- This article will introduce you to jest unit test
- “颜值经济”的野望:华熙生物净利率六连降,收购案遭上交所问询
- Grouping operation aligned with specified datum
- Using consult to realize service discovery: instance ID customization
- It's so embarrassing, fans broke ten thousand, used for a year!
- Swagger 3.0 天天刷屏,真的香嗎?
- 阿里云Q2营收破纪录背后,云的打开方式正在重塑
猜你喜欢
I'm afraid that the spread sequence calculation of arbitrage strategy is not as simple as you think
每个前端工程师都应该懂的前端性能优化总结:
教你轻松搞懂vue-codemirror的基本用法:主要实现代码编辑、验证提示、代码格式化
如何将数据变成资产?吸引数据科学家
Grouping operation aligned with specified datum
2018中国云厂商TOP5:阿里云、腾讯云、AWS、电信、联通 ...
熬夜总结了报表自动化、数据可视化和挖掘的要点,和你想的不一样
axios学习笔记(二):轻松弄懂XHR的使用及如何封装简易axios
CCR炒币机器人:“比特币”数字货币的大佬,你不得不了解的知识
Arrangement of basic knowledge points
随机推荐
Want to do read-write separation, give you some small experience
(2)ASP.NET Core3.1 Ocelot路由
带你学习ES5中新增的方法
加速「全民直播」洪流,如何攻克延时、卡顿、高并发难题?
Filecoin主网上线以来Filecoin矿机扇区密封到底是什么意思
快快使用ModelArts,零基礎小白也能玩轉AI!
Why do private enterprises do party building? ——Special subject study of geek state holding Party branch
vue-codemirror基本用法:实现搜索功能、代码折叠功能、获取编辑器值及时验证
Character string and memory operation function in C language
至联云分享:IPFS/Filecoin值不值得投资?
Serilog原始碼解析——使用方法
How long does it take you to work out an object-oriented programming interview question from Ali school?
比特币一度突破14000美元,即将面临美国大选考验
What is the side effect free method? How to name it? - Mario
Top 10 best big data analysis tools in 2020
TRON智能钱包PHP开发包【零TRX归集】
中小微企业选择共享办公室怎么样?
Process analysis of Python authentication mechanism based on JWT
至联云解析:IPFS/Filecoin挖矿为什么这么难?
钻石标准--Diamond Standard