‭Review Published Versions 1 Vol 3 (3) : 418-443 2021
Download
Data Set and Evaluation of Automated Construction of Financial Knowledge Graph
: 2021 - 02 - 11
: 2021 - 03 - 20
: 2021 - 04 - 22
123 1 0
Abstract & Keywords
Abstract: With the development of entity extraction, relationship extraction, knowledge reasoning, and entity linking, the research on knowledge graph technologies has been carried out in full swing in recent years. To better promote the development of knowledge graph, especially in the Chinese language and in the financial industry, we built a high-quality data set, named financial research report knowledge graph (FR2KG), and organized the automated construction of financial knowledge graph evaluation at the 2020 China Knowledge Graph and Semantic Computing Conference (CCKS2020). FR2KG consists of 17,799 entities, 26,798 relationship triples, and 1,328 attribute triples covering 10 entity types, 19 relationship types, and 6 attributes. Participants are required to develop a constructor that will automatically construct a financial knowledge graph based on the FR2KG. In addition, we summarized the technologies for automatically constructing knowledge graphs, and introduced the methods used by the winners and the results of this evaluation.
Keywords: Knowledge graph; Entity extraction; Relation extraction; FR2KG data set; CCKS
Acknowledgments
[1]
Jia, D., et al.: ImageNet: A large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248-255 (2009)
[2]
Ji, H., Nothman, J.: Overview of TAC-KBP2016 tri-lingual EDL and its impact on end-to-end KBP. In: Proceedings of Text Analysis Conference, pp. (2016)
[3]
Elhammadi, S., et al.: A high precision pipeline for financial knowledge graph construction. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 967-977 (2020)
[4]
Ein-Dor, L., et al.: Financial event extraction using Wikipedia-based weak supervision. In: Proceedings of the Second Workshop on Economics and Natural Language Processing, pp. 10-15 (2019)
[5]
TAC KBP 2016 Cold Start Track. Available at: https://tac.nist.gov/2016/KBP/ColdStart/index.html. Accessed 30 July 2021
[6]
Devlin, J., et al.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp. 4171–4186 (2019)
[7]
Zhang, Z., et al.: ERNIE: Enhanced Language Representation with Informative Entities. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1441-1451 (2019)
[8]
Ringland, N., et al.: NNE: A data set for nested named entity recognition in English newswire. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 5176–5181 (2019)
[9]
Thorne, J., et al. FEVER: A large-scale data set for fact extraction and VEification. In: Proceedings of NAACL-HLT, pp. 809-819 (2018)
[10]
Yao, Y., et al: DocRED: A large-scale document-level relation extraction data set. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 764-777 (2019)
[11]
Zhu, T., et al.: Towards accurate and consistent evaluation: A data set for distantly-supervised relation extraction. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 6436-6447 (2020)
[12]
Wang L, L., et al.: CORD-19: The Covid-19 open research data set. arXiv preprint arXiv:2004.10706v2 (2020)
[13]
D'Souza, J., et al.: The STEM-ECR data set: Grounding scientific entity references in STEM scholarly content to authoritative encyclopedic and lexicographic sources. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 2192-2203 (2020)
[14]
Sang, E.F., Meulder, F.D.: Introduction to the conll-2003 shared task: Language-independent named entity recognition. In: Proceedings of NAACL-HLT, pp. 142–147 (2003)
[15]
BOSON data set. Available at: https://github.com/InsaneLife/ChineseNLPCorpus/tree/master/NER/boson. Accessed June 30 2021
[16]
People’s Daily data set. Available at: https://github.com/InsaneLife/ChineseNLPCorpus/tree/master/NER/renMinRiBao. Accessed June 30 2021
[17]
Levow, G.A.: The third international Chinese language processing bakeoff: Word segmentation and named entity recognition. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pp. 108-117 (2006)
[18]
Mikolov, T., et al.: Efficient estimation of word representations in vector space. In: International Conference on Learning Representations, pp 1-12 (2013)
[19]
Yao, L., et al.: Biomedical named entity recognition based on deep neutral network. International Journal Hybrid Information Technolology 8(8), 279–288 (2015)
[20]
Nguyen, T. H., et al.: Toward mention detection robustness with recurrent neural networks. arXiv preprint arXiv:1602.07749 (2016)
[21]
Zheng, S., et al.: Joint extraction of entities and relations based on a novel tagging scheme. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp. 1227–1236 (2017)
[22]
Huang, Z., et al.: Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015)
[23]
Li, P.H., et al.: Leveraging linguistic structures for named entity recognition with bidirectional recursive neural networks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 2664-2669 (2017)
[24]
Wang, C., et al.: Code-switched named entity recognition with embedding attention. In: CALCS, pp. 154–158 (2018)
[25]
Kuru, O., et al.: CharNER: Character-Level Named Entity Recognition. In: Proceedings of the 26th International Conference on Computational Linguistics, pp. 911–921 (2016)
[26]
Peters, M.E., et al.: Deep Contextualized Word Representations. In: Proceedings of NAACL-HLT, pp. 2227–2237 (2018)
[27]
Zheng, S, et al.: Joint extraction of entities and relations based on a novel tagging scheme. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp. 1227–1236 (2017)
[28]
Peters, M. E., et al.: Semi-supervised sequence tagging with bidirectional language models. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp. 1756–1765 (2017)
[29]
Peters, M. E., et al.: Deep contextualized word representations. In: Proceedings of NAACL-HLT, pp. 2227–2237 (2018)
[30]
Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6000-6010 (2017)
[31]
Liu, T., et al.: Towards improving neural named entity recognition with gazetteers. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 5301–5307 (2019)
[32]
Song, C.H., et al.: Improving Neural Named Entity Recognition with Gazetteers. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 5301–5307 (2019)
[33]
Jie, Z., Lu, W.: Dependency-Guided LSTM-CRF for Named Entity Recognition. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp. 3860–3870 (2019)
[34]
Liu, Y., et al.: GCDT: A global context enhanced deep transition architecture for sequence labeling. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2431–2341 (2019)
[35]
Luo, Y., et al.: Hierarchical contextualized representation for named entity recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 8441–8448 (2019)
[36]
Collobert, R., et al.: Natural language processing (Almost) from scratch. Journal of Machine Learning Research 12, 1462–1467 (2011)
[37]
Strubell, E., et al.: Fast and accurate entity recognition with Iterated Dilated Convolutions. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2670–2680 (2017)
[38]
Lample, G., et al.: Neural architectures for named entity recognition. In: Proceedings of NAACL-HLT, pp. 260–270 (2016)
[39]
Chiu, J.P.C., et al.: Named Entity Recognition with Bidirectional LSTM-CNNs. Transactions of the Association for Computational Linguistics 4, 357-370 (2016)
[40]
Chaudhary, A., et al.: A little annotation does a lot of good: A study in bootstrapping low-resource named entity recognizers, In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp. 5163–5173 (2019)
[41]
Zhai, F., et al.: Neural models for sequence chunking. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 3365–3371 (2017)
[42]
Li, X., et al.: A unified MRC framework for named entity recognition. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, pp. 5849–5859 (2020)
[43]
Li, X., et al.: Dice loss for data-imbalanced NLP tasks. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 465–76 (2020)
[44]
Liu, X., et al.: Recognizing named entities in tweets. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 359-367 (2011)
[45]
Etzioni, O., et al.: Unsupervised named-entity extraction from the web: An experimental study. Artificial intelligence 165(1), 91-134 (2005)
[46]
Zhang, S., Elhadad, N.: Unsupervised biomedical named entity recognition: Experiments with clinical and biological texts. Journal of Biomedical Informatics 46(6), 1088-1098 (2013)
[47]
Nadeau, D., et al.: Unsupervised named-entity recognition: Generating gazetteers and resolving ambiguity. In: Conference of the Canadian Society for Computational Studies of Intelligence, pp. 266-277 (2006)
[48]
Brooke, J., et al..: Bootstrapped text-level named entity recognition for literature. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 344-350 (2016)
[49]
Jia, C., et al.: Cross-domain NER using cross-domain language modeling. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2464-2474 (2019)
[50]
Collins, M., Singer, Y.: Unsupervised models for named entity classification. In: 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 100-110 (1999)
[51]
Hendrickx, I., et al.: SemEval-2010 Task 8: Multi-way classification of semantic relations between pairs of nominals. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 33-38 (2010)
[52]
Li, S., et al.: DuIE: A large-scale Chinese data set for information extraction. In: The CCF International Conference on Natural Language Processing and Chinese Computing, pp. 791-800 (2019)
[53]
Zeng, D., et al.: Relation classification via Convolutional Deep Neural Network. In: Proceedings of the 25th International Conference on Computational Linguistics, pp. 2335–2344 (2014)
[54]
Santos, C.N., et al.: Classifying relations by ranking with Convolutional Neural Networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference, pp. 626–634 (2015)
[55]
Xu, K., et al.: Semantic relation classification via Convolutional Neural Networks with simple negative sampling. In: Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing, pp. 536–540 (2015)
[56]
Xu, Y., et al.: Classifying relations via long short term memory networks along shortest dependency paths. In: Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing, no. September, pp. 1785–1794 (2015)
[57]
Zhang, Y., et al.: Graph convolution over pruned dependency trees improves relation extraction. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2205–2215 (2018)
[58]
Guo, Z., et al.: Attention guided graph convolutional networks for relation extraction. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 241–251 (2020)
[59]
Mintz, M., et al.: Distant supervision for relation extraction without labeled data. In: Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, pp. 1003-1011 (2009)
[60]
Zeng, D., et al.: Distant supervision for relation extraction via piecewise convolutional neural networks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1753–1762 (2015)
[61]
Jiang, X., et al.: Relation extraction with multi-instance multi-label convolutional neural networks. In: Proceedings of the 26th International Conference on Computational Linguistics, pp. 1471–1480 (2016)
[62]
Ji, G., et al.: Distant supervision for relation extraction with sentence-level attention and entity descriptions. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence, pp. 3060–3066 (2017)
[63]
Lin, Y., et al.: Neural relation extraction with selective attention over instances. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 2124–2233 (2016)
[64]
Bordes, A., et al.: Translating embeddings for modeling multi-relational data. In: Advances in Neural Information Processing Systems, pp. 2787-2795 (2013)
[65]
Du, J., et al.: Multi-level structured self-attentions for distantly supervised relation extraction. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2216–2225 (2018)
[66]
Vashishth, S., et al.: RESIDE: Improving distantly-supervised neural relation extraction using side information. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 1257-1266 (2018)
[67]
Wang, G., et al.: Label-free distant supervision for relation extraction via knowledge graph embedding. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 2246–2255 (2018)
[68]
Hasegawa, T., et al.: Discovering relations among named entities from large corpora. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, pp. 415-422 (2004)
[69]
Rozenfel, B., Ronen, F.: High-performance unsupervised relation extraction from large corpora. In: The Sixth International Conference on Data Mining, pp. 1032-1037 (2006)
[70]
Davidov, D., et al.: Fully unsupervised discovery of concept-specific relationships by web mining. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 232-239 (2007)
[71]
Yan, Y., et al.: Unsupervised relation extraction by mining Wikipedia texts using information from the web. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 1021-1029 (2009)
[72]
Bollegala, D.T., et al.: Measuring the similarity between implicit semantic relations from the web. In: Proceedings of the 18th International Conference on World Wide Web, pp. 651-660 (2009)
[73]
Miwa, M., et al.: End-to-end relation extraction using LSTMs on sequences and tree structures. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 1105–1116 (2016)
[74]
Zheng, S., et al.: Joint extraction of entities and relations based on a novel tagging scheme. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp. 506-514 (2017)
[75]
Zeng, X., et al.: Extracting relational facts by an end-to-end neural model with copy mechanism. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pp. 506–514 (2018)
[76]
Fu, T.J., et al.: GraphRel: Modeling text as relational graphs for joint entity and relation extraction. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1409–18 (2019)
[77]
Sun, C., et al.: Extracting entities and relations with joint minimum risk training. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 2256-2265 (2018)
[78]
Takanobu, R., et al: A hierarchical framework for relation extraction with reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 7072–7079 (2019)
[79]
Li, X., et al.: Entity-relation extraction as multi-turn question answering. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1340–50 (2019)
Article and author information
Cite As
Citation: Wang, W.G., et al.: Data set and evaluation of automated construction of financial knowledge graph. Data Intelligence 3(3), 418-443 (2021). doi: 10.1162/dint_a_00108
Wenguang Wang
W.G. Wang is the team leader for this project. He provided overall technical leadership, designed the data schema, performed curation work and contributed to writing and editing of the manuscript.
wangwenguang@datagrand.com
Wenguang Wang received his M.S. degree from Zhejiang University, China.He is currently Vice President of DataGrand Inc. His research interestsinclude knowledge graph, natural language processing, computer vision,deep learning, reinforcement learning and content generation. He has tenpatents and several academic publications in artificial intelligence. He isa member of China Computer Federation (CCF), Chinese Association forArtificial Intelligence (CAAI) and Chinese Information Processing Society ofChina (CIPS).
0000-0002-9617-0818
Yonglin Xu
Y.L. Xu investigated the latest progress of Relation Extraction and wrote relevant chapters. All the authors have made meaningful and valuable contributions in revising and proofreading the resulting manuscript.
Yonglin Xu is currently an algorithm engineer at DataGrand Inc. He receivedhis Master’s degree from Shanghai University in 2019. His research interestsinclude information extraction and knowledge graph.
0000-0002-7716-0841
Chunhui Du
C.H. Du investigated the latest progress of Entity Extraction and wrote relevant chapters. All the authors have made meaningful and valuable contributions in revising and proofreading the resulting manuscript.
Chunhui Du received his B.S. degree from the University of ElectronicScience and Technology of China in 2018. He is currently pursuing a PhDdegree from the School of Electronic Information and Electrical Engineering,Shanghai Jiao Tong University, China. His research interests include federatedlearning and natural language processing.
0000-0002-9086-6021
Yunwen Chen
Y.W. Chen organized the data annotation and contributed to writing and editing of the manuscript as a senior author. All the authors have made meaningful and valuable contributions in revising and proofreading the resulting manuscript.
Yunwen Chen received his PhD degree from Fudan University, China. He isthe founder and the CEO of DataGrand Inc., Shanghai, a leading AI companyin China. He was the Chief Data Officer of Shanda, Inc., Burlington, IA,USA, the senior Director of Tencent, Inc., Shenzhen, China, and a researcherof Baidu, Inc., Beijing, China. He has 32 patents and several academicpublications. His current research interests include data mining, naturallanguage processing, search and recommendation systems, and knowledgegraphs. Dr. Chen was a recipient of the Distinguished Graduate Student in2008. He is a Senior Member of the China Computer Federation (CCF) anda member of the ACM.
0000-0002-9086-6021
Yijie Wang
Y.Y. Wang and H. Wen contributed to the statistics and review of evaluation results. All the authors have made meaningful and valuable contributions in revising and proofreading the resulting manuscript.
Yijie Wang is an algorithm engineer at DataGrand Inc. He received hisMaster’s degree from Northeastern University in 2018. His research interestsinclude data mining and knowledge graph.
0000-0003-1310-7467
Hui Wen
Y.Y. Wang and H. Wen contributed to the statistics and review of evaluation results. All the authors have made meaningful and valuable contributions in revising and proofreading the resulting manuscript.
Hui Wen graduated with a Master’s degree in Computer ApplicationTechnology from Tongji University in 2014. He is a co-founder and seniorscientist of DataGrand Inc, responsible for the research and development ofknowledge graphs and data mining. He has rich R&D experience and stronginterests in knowledge graphs, recommendation systems, search systems,distributed systems and so on.
0000-0003-1310-7467
Publication records
Published: Sept. 16, 2021 (Versions1
References
Data Intelligence