Published Versions 1 Vol 3 (3) : 460-476 2021
Download
A Prior Information Enhanced Extraction Framework forDocument-level Financial Event Extraction
: 2021 - 05 - 18
: 2021 - 04 - 08
: 2021 - 02 - 11
286 3 0
Abstract & Keywords
Abstract: Document-level financial event extraction (DFEE) is the task of detecting event and extracting the corresponding event arguments in financial documents, which plays an important role in information extraction in the financial domain. This task is challenging as the financial documents are generally long text and event arguments of one event may be scattered in different sentences. To address this issue, we propose a novel Prior Information Enhanced Extraction framework (PIEE) for DFEE, leveraging prior information from both event types and pre-trained language models. Specifically, PIEE consists of three components: event detection, event argument extraction, and event table filling. In event detection, we identify the event type. Then, the event type is explicitly used for event argument extraction. Meanwhile, the implicit information within language models also provides considerable cues for event arguments localization. Finally, all the event arguments are filled in an event table by a set of predefined heuristic rules. To demonstrate the effectiveness of our proposed framework, we participate the share task of CCKS2020 Task5-2: Document-level Event Arguments Extraction. On both Leaderboard A and Leaderboard B, PIEE takes the first place and significantly outperforms the other systems.
Keywords: Event extraction; Information extraction; Financial event; Event detection; Event argument extraction
Acknowledgments
[1]
Chen, Y., et al.: Event extraction via dynamic multi-pooling convolutional neural networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 167–176 (2015)
[2]
Nguyen, T.H., Cho, K., Grishman, R.: Joint event extraction via recurrent neural networks. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 300–309 (2016)
[3]
Nguyen, T.M., Nguyen, T.H.: One for all: Neural joint modeling of entities and events. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 6851–6858 (2019)
[4]
Liu, X., Luo, Z., Huang, H.Y.: Jointly multiple events extraction via attention-based graph information aggregation. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 1247–1256 (2018)
[5]
Zhang, T., Ji, H.: Event extraction with generative adversarial imitation learning. arXiv preprint arXiv:1804.07881 (2018)
[6]
Lample, G., et al.: Neural architectures for named entity recognition. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 260–270 (2016)
[7]
Yang, H., et al.: Dcfee: A document-level Chinese financial event extraction system based on automatically labeled training data. In: Proceedings of ACL 2018, System Demonstrations, pp. 50–55 (2018)
[8]
Zheng, S., et al.: Doc2edag: An end-to-end document-level framework for Chinese financial event extraction. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 337–346 (2019)
[9]
Lample, G., et al.: Neural architectures for named entity recognition. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 260–270 (2016)
[10]
Levy, O., et al.: Zero-shot relation extraction via reading comprehension. In: Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), pp. 333–342 (2017)
[11]
Li, X., et al.: Entity-relation extraction as multi-turn question answering. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1340–1350 (2019)
[12]
Yu, J., Bohnet, B., Poesio, M.: Named entity recognition as dependency parsing. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 6470–6476 (2020)
[13]
Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543 (2014)
[14]
Peters, M., et al.: Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 2227–2237 (2018)
[15]
Cui, Y., et al.: Pre-training with whole word masking for Chinese BERT. arXiv preprint arXiv:1906.08101 (2019)
[16]
Sun, Y., et al.: Ernie: Enhanced representation through knowledge integration. arXiv preprint arXiv:1904.09223 (2019)
[17]
Liu, Y., et al.: Roberta: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
[18]
Wei, J., et al.: Neural contextualized representation for Chinese language understanding. arXiv preprint arXiv:1909.00204 (2019)
[19]
Riedel, S., Yao, L., McCallum, A.: Modeling relations and their mentions without labeled text. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 148–163 (2010)
[20]
Zeng, D., et al.: Distant supervision for relaftion extraction via piecewise convolutional neural networks. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1753–1762 (2015)
[21]
Lin, Y., et al.: Neural relation extraction with selective attention over instances. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2124–2133 (2016)
[22]
Jiang, X., et al.: Relation extraction with multi-instance multi-label convolutional neural networks. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 1471–1480 (2016)
[23]
Li, X., et al.: A unified MRC framework for named entity recognition. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5849–5859 (2020)
[24]
Dozat, T., Manning, C.D.: Deep biaffine attention for neural dependency parsing. arXiv preprint arXiv:1611.01734 (2016)
[25]
Mintz, M., et al.: Distant supervision for relation extraction without labeled data. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2. pp. 1003–1011. Association for Computational Linguistics (2009)
[26]
Chen, Y., Liu, S., Zhang, X., Liu, K., Zhao, J.: Automatically labeled data generation for large scale event extraction. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 409–419 (2017)
Article and author information
Cite As
Citation: Wang, H.T., et al.: A prior information enhanced extraction framework for document-level financial event extraction. Data Intelligence 3(3), 460-476 (2021). doi: 10.1162/dint_a_00103
Haitao Wang
H.T. Wang contributed to data set statistics, design of experiments and manuscript writing. All authors have made meaningful and valuable contributions in revising and proofreading manuscripts.
Haitao Wang is a postgraduate student at the School of Computer Scienceand Technology, Soochow University. He is interested in multi-formatinformation extraction, especially semantic relation extraction betweenentities, and document-level event extraction.
0000-0003-2531-149X
Tong Zhu
T. Zhu contributed to data set statistics, experiments with different pretrained language models and manuscript writing. All authors have made meaningful and valuable contributions in revisingand proofreading manuscripts.
Tong Zhu is a postgraduate student at the School of Computer Science andTechnology, Soochow University. He is interested in multi-format informationextraction, especially semantic relation extraction between entities, anddocument-level event extraction.
0000-0002-5433-8504
Mingtao Wang
M.T. Wang contributed to strategies in extracting equity freeze event, table content parsing and data annotation. All authors have made meaningful and valuable contributions in revisingand proofreading manuscripts.
Mingtao Wang is a graduate student at the School of Computer Science andTechnology, Soochow University. He is a research assistant in the universitynow. He is interested in information extraction, dialogue system andknowledge base question answering.
0000-0002-8838-1519
Guoliang Zhang
G.L. Zhang contributed to data set statistics, bad case analysis, data annotation and manuscript revision. All authors have made meaningful and valuable contributions in revisingand proofreading manuscripts.
Guoliang Zhang is a postgraduate student at the School of ComputerScience and Technology, Soochow University. He is interested in multi-formatinformation extraction, especially semantic relation extraction betweenentities or events, and event extraction.
0000-0002-3639-0712
Wenliang Chen
W.L. Chen contributed to data set statistics, design of the whole framework and manuscript writing. All authors have made meaningful and valuable contributions in revisingand proofreading manuscripts.
wlchen@suda.edu.cn
Wenliang Chen received his Bachelor’s degree in Mechanical Engineeringand PhD degree in Computer Science from Northeastern University in 1999and 2005, respectively. He joined Soochow University in 2013 and iscurrently a professor in the university. Prior to joining Soochow University,he was a research scientist at the Institute for Infocomm Research of Singaporefrom 2011 to 2013. From 2005 to 2010, he worked as an expert researcherin NICT, Japan. His current research interests include parsing, machinetranslation, and machine learning.
0000-0002-5429-6084
The research is supported by the National Natural Science Foundation of China (No. 61936010 andNo. 61876115). This work was partially supported by Collaborative Innovation Center of Novel SoftwareTechnology and Industrialization.
Publication records
Published: Sept. 16, 2021 (Versions1
References
Data Intelligence