Published Versions 1 Vol 3 (3) : 389-401 2021
Download
Semi-Supervised Noisy Label Learning for Chinese Clinical Named Entity Recognition
: 2021 - 02 - 15
: 2021 - 04 - 08
: 2021 - 04 - 30
32 0 0
Abstract & Keywords
Abstract: This paper describes our approach for the Chinese clinical named entity recognition (CNER) task organized by the 2020 China Conference on Knowledge Graph and Semantic Computing (CCKS) competition. In this task, we need to identify the entity boundary and category labels of six entities from Chinese electronic medical record (EMR). We constructed a hybrid system composed of a semi-supervised noisy label learning model based on adversarial training and a rule post-processing module. The core idea of the hybrid system is to reduce the impact of data noise by optimizing the model results. Besides, we used post-processing rules to correct three cases of redundant labeling, missing labeling, and wrong labeling in the model prediction results. Our method proposed in this paper achieved strict criteria of 0.9156 and relax criteria of 0.9660 on the final test set, ranking first.
Keywords: Named entity recognition; Electronic medical record; Noisy label learning; Semi-supervised; Adversarial training
[1]
Marrero, M., et al.: Named entity recognition: Fallacies, challenges and opportunities. Computer Standards & Interfaces 35(5), 482–489 (2013)
[2]
Szegedy, C., et al.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)
[3]
Madry, A., et al.: Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017)
[4]
Devlin, J., et al.: Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186 (2019)
[5]
Xu, K., et al.: A bidirectional LSTM and conditional random fields approach to medical name identity recognition. In: International Conference on Advanced Intelligent Systems and Informatics, pp. 355–365 (2017)
[6]
Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional LSTM-CNNS-CRF. In: Proceedings of the 54thAnnual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1064–1074 (2016)
[7]
Lample, G., et al.: Neural architectures for named entity recognition. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 260–270 (2016)
[8]
Lafferty, J.D., McCallum, A., Pereira, F.C.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289 (2001)
[9]
Sutton, C., McCallum, A.: An introduction to conditional random fields for relational learning. Introduction to Statistical Relational Learning 2, 93–128 (2006)
[10]
Miyato, T., Dai, A.M., Goodfellow, I.: Adversarial training methods for semi-supervised text classification. arXiv preprint arXiv:1605.07725 (2016)
[11]
Srivastava, N., et al.: Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15(1), 1929–1958 (2014)
[12]
Loshchilov, I., Hutter, F.: Fixing weight decay regularization in Adam. arXiv preprint arXiv:1711.05101 (2018)
[13]
Cui, Y., et al.: Revisiting pre-trained models for Chinese natural language- processing. In: Findings of EMNLP. Association for Computational Linguistics, pp. 657–668 (2020)
Article and author information
Cite As
Citation: Li, Z.C., et al.: Semi-supervised noisy label learning for Chinese clinical named entity recognition. Data Intelligence 3(3), 389-401 (2021). doi: 10.1162/dint_a_00099
Zhucong Li
Zhucong Li is currently a graduate student at School of Artificial Intelligence,University of Chinese Academy of Sciences. His research interests includeinformation extraction, knowledge graph and natural language processing.
0000-0002-6057-5784
Zhen Gan
Zhen Gan is currently a graduate student at Beijing University of ChemicalTechnology. His research interests mainly focus on natural language processingand information extraction.
0000-0002-5128-3501
Baoli Zhang
Baoli Zhang received his Master’s degree in Computer Science from BeijingUniversity of Posts and Telecommunications. He is now an engineer ofInstitute of Automation, Chinese Academy of Sciences. His research interestsmainly focus on information extraction, knowledge graph and natural languageprocessing.
0000-0002-5815-7292
Yubo Chen
Yubo Chen is an Associate Professor of the National Laboratory of PatternRecognition (NLPR), Institute of Automation, Chinese Academy of Sciences.His research interests include information extraction, knowledge graph andnatural language processing. He has published over 30 papers in prestigiousconferences such as ACL, EMNLP, COLING, AAAI and IJCAI.
0000-0002-5485-9916
Jing Wan
Jing Wan is an Associate Professor of Beijing University of ChemicalTechnology. Her research interests include knowledge graph and culturalheritage digital protection. She has published over 40 papers.
0000-0002-4232-7883
Kang Liu
Kang Liu is currently a Professor of the National Laboratory of PatternRecognition (NLPR), Institute of Automation, Chinese Academy of Sciences.His research interests include natural language processing, knowledge graph,and question answering. He has published over 90 papers in journals like IEEETransactions on Knowledge and Data Engineering (TKDE) and conferenceslike ACL, IJCAI, CIKM, EMNLP, and COLING. He has won COLING 2014Best Paper Award.
0000-0002-6083-8433
Jun Zhao
Jun Zhao is a Professor of the National Laboratory of Pattern Recognition(NLPR), Institute of Automation, Chinese Academy of Sciences, and Schoolof Artificial Intelligence, University of Chinese Academy of Sciences. Prof.Zhao has published over 90 peer-reviewed papers in the prestigiousconferences and journals, including ACL and AAAI. He has won COLING2014 Best Paper Award.
0000-0002-6083-8433
Shengping Liu
Shengping Liu received his PhD degree from the Department of InformationScience, School of Mathematics, Peking University. Now he is a senior technicalexpert of UNISOUND AI Technology Co., Ltd.
This work is supported by the National Key R&D Program of China (2020AAA0106400), the National Natural Science Foundation of China (No. 61831022, No. 61806201) and the Key Research Program of the Chinese Academy of Sciences (Grant No. ZDBS-SSW-JSC006). This work is also supported by Beijing Academy of Artificial Intelligence (BAAI).
Publication records
Published: Sept. 15, 2021 (Versions1
References
Data Intelligence