Published Versions 1 Vol 1 (4) : 333-349 2020
Download
Knowledge Graph Construction and Applications for Web Search and Beyond
: 2019 - 05 - 27
: 2019 - 08 - 18
: 2019 - 08 - 22
66 2 0
Abstract & Keywords
Abstract: Knowledge graph (KG) has played an important role in enhancing the performance of many intelligent systems. In this paper, we introduce the solution of building a large-scale multi-source knowledge graph from scratch in Sogou Inc., including its architecture, technical implementation and applications. Unlike previous works that build knowledge graph with graph databases, we build the knowledge graph on top of SogouQdb, a distributed search engine developed by Sogou Web search Department, which can be easily scaled to support petabytes of data. As a supplement to the search engine, we also introduce a series of models to support inference and graph based querying. Currently, the data of Sogou knowledge graph that are collected from 136 different websites and constantly updated consist of 54 million entities and over 600 million entity links. We also introduce three applications of knowledge graph in Sogou Inc.: entity detection and linking, knowledge based question answering and knowledge based dialogue system. These applications have been used in Web search products to help user acquire information more efficiently.
Keywords: Knowledge graph; Search engine; Question answering
Acknowledgments
[1]
K. Bollacker, C. Evans, P. Paritosh, T. Sturge, & J. Taylor: Freebase: A collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, 2008, pp. 1247–1250. doi: 10.1145/1376616.1376746.
[2]
D. Vrandečić, & M. Kr¨otzsch. Wikidata: A free collaborative knowledgebase. Communications of the ACM 57(10)(2014), 78–85. doi: 10.1145/2629489.
[3]
J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas, P.N. Mendes, S. Hellmann, M. Morsey, P. van Kleef, & S. Auer: DBpedia–a large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web 6(2)(2015), 167–195. doi: 10.3233/SW-140134.
[4]
F.M. Suchanek, G. Kasneci, & G. Weikum. Yago: A core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web, 2017, pp. 697–706. Available at: http://www2007.wwwconference.org/papers/paper391.pdf.
[5]
R. Speer, & C. Havasi. Representing general relational knowledge in concept-net 5. In: Proceedings of the 8th conference on language resources and evaluation (LREC’12), 2012, pp. 3679–3686.
[6]
B. Xu, Y. Xu, J. Liang, C. Xie, B. Liang, W. Cui, & Y. Xiao. CN-DBpedia: A never-ending Chinese knowledge extraction system. In: International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, 2017, pp. 428–438. doi: 10.1007/978-3-319-60045-1_44.
[7]
T. Yang, D. Du, & F. Zhang. The tai system for trilingual entity discovery and linking track in tac kbp 2017. Available at: https://pdfs.semanticscholar.org/15f3/7711fe63a80dcb09f85ce597ddbc712bd767.pdf.
[8]
H. Ji, X. Pan, B. Zhang, J. Nothman, J. Mayfield, P. McNamee, & C. Costello. Overview of TAC-KBP2017 13 languages entity discovery and linking. Available at: http://nlp.cs.rpi.edu/paper/kbp2017.pdf.
[9]
T. Joachims. Optimizing search engines using clickthrough data. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002, pp. 133–142. doi: 10.1145/775047.775067.
[10]
G. Luo, X. Huang, C.Y. Lin, & Z. Nie. Joint named entity recognition and disambiguation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015, pp. 879–888. Available at: https://www.aclweb.org/anthology/D15-1104.
[11]
Y. Yang, & M.W. Chang. S-mart: Novel tree-based structured learning algorithms applied to tweet entity linking. arXiv preprint. arXiv:1609.08075, 2016.
[12]
W. Fang, J. Zhang, D. Wang, Z. Chen, & M. Li. Entity disambiguation by knowledge and text jointly embedding. In: Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, 2016, pp. 260–269. Available at: https://www.aclweb.org/anthology/K16-1026.
[13]
M. Francis-Landau, G. Durrett, & D. Klein. Capturing semantic similarity for entity linking with convolutional neural networks. arXiv preprint. arXiv:1604.00734, 2016.
[14]
N. Gupta, S. Singh, & D. Roth. Entity linking via joint encoding of types, descriptions, and context. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017, pp. 2681–2690. doi: 10.18653/v1/D17-1284.
[15]
Y. Sun, L. Lin, D. Tang, N. Yang, Z. Ji, & X. Wang. Modeling mention, context and entity with neural networks for entity disambiguation. In: Proceedings of the 24th International Conference on Artificial Intelligence (IJCAI'15), 2015, pp. 1333-1339. Available at: https://aaai.org/ocs/index.php/IJCAI/IJCAI15/paper/view/11048/10848.
[16]
A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, & O. Yakhnenko. Translating embeddings for modeling multi-relational data. In: Advances in Neural Information Processing Systems 26 (NIPS 2013). Available at: http://papers.nips.cc/paper/5071-translating-embeddings-for-modeling-multi-relational-data.pdf.
[17]
A. Bordes, J. Weston, R. Collobert, & Y. Bengio. Learning structured embeddings of knowledge bases. In: Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, 2011, pp. 301-306. Available at: https://www.aaai.org/ocs/index.php/AAAI/AAAI11/paper/view/3659/3898.
[18]
T. Mikolov, Q.V. Le, & I. Sutskever. Exploiting similarities among languages for machine translation. arXiv preprint. arXiv:1309.4168, 2013.
[19]
Z. Wang, J. Zhang, J. Feng, & Z. Chen. Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014, pp. 1112-1119. Available at:  https://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8531/8546.
[20]
Q. Cai, & A. Yates. Large-scale semantic parsing via schema matching and lexicon extension.     In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013, pp. 423–433. Available at:  https://www.aclweb.org/anthology/P13-1042.
[21]
T. Kwiatkowski, L. Zettlemoyer, S. Goldwater, & M. Steedman. Inducing probabilistic  ccg grammars from logical form with higher-order unification. In: Proceedings of the 2010 conference on empirical methods in natural language processing, 2010, pp. 1223–1233. Available at:  https://www.aclweb.org/anthology/D10-1119.
[22]
L.S. Zettlemoyer, & M. Collins. Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars. arXiv preprint. arXiv:1207.1420, 2012.
[23]
J. Berant, A. Chou, R. Frostig, & P.Liang. Semantic parsing on Freebase from question-answer pairs. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 2013, pp. 1533–1544. Available at: https://www.aclweb.org/anthology/D13-1160.
[24]
J. Berant, & P. Liang. Semantic parsing via paraphrasing. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014, pp. 1415–1425. Available at: https://www.aclweb.org/anthology/P14-1133.
[25]
P. Liang.Lambda dependency-based compositional semantics. arXiv preprint.arXiv:1309.4408, 2013.
[26]
P.J. Price. Evaluation of spoken language systems: The atis domain. In: HLT '90 Proceedings of the Workshop on Speech and Natural Language, Association for Computational Linguistics, 1990, pp. 91-95. doi: 10.3115/116580.116612.
[27]
M. Chen, K. Dorer, E. Foroughi, F. Heintz, Z.X. Huang, S. Kapetanakis, ... & X. Yin. RoboCup soccer server: For soccer server version 7.07 and later. (February 11, 2003). Available at: https://rcsoccersim.github.io/rcssserver-manual-20030211.pdf.
[28]
J.M. Zelle, & R.J. Mooney. Learning to parse database queries using inductive logic programming. In: Proceedings of the national conference on artificial intelligence, 1996, pp. 1050–1055. Available at: http://aaai.org/Papers/AAAI/1996/AAAI96-156.pdf
[29]
J. Berant, A. Chou, R. Frostig, & P. Liang. Semantic parsing on Freebase from question-answer pairs. In: Empirical Methods in Natural Language Processing (EMNLP) (2013). Available at: https://cs.stanford.edu/~pliang/papers/freebase-emnlp2013.pdf.
[30]
X. Yao, & B. van Durme. Information extraction over structured data: Question answering with Freebase. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2014, pp. 956–966.
[31]
D. Lukovnikov, A. Fischer, J. Lehmann, & S. Auer. Neural network-based question answering over knowledge graphs on word and character level. In: Proceedings of the 26th international conference on World Wide Web, 2017, pp. 1211–1220. doi: 10.1145/3038912.3052675.
[32]
W. Yin, M. Yu, B. Xiang, B. Zhou, & H. Sch¨utze. Simple question answering by attentive convolutional neural network. arXiv preprint. arXiv:1606.03391, 2016.
[33]
M. Yu, W. Yin, K.S. Hasan, C.D. Santos, B. Xiang, & B. Zhou. Improved neural relation detection for knowledge base question answering. arXiv preprint. arXiv:1704.06194, 2017.
[34]
N. Asghar, P. Poupart, X. Jiang, & H. Li. Deep active learning for dialogue generation. arXiv preprint. arXiv:1612.03929, 2016.
[35]
S. Lee, & M. Eskenazi. Recipe for building robust spoken dialog state trackers: Dialog state tracking challenge system description. In: Proceedings of the SIGDIAL 2013 Conference, 2013, pp. 414–422. Available at: https://sigdial.org/files/workshops/conference14/proceedings/pdf/SIGDIAL66.pdf.
[36]
G. Tur, L. Deng, D. Hakkani-T¨ur, & X. He. Towards deeper understanding: Deep convex networks for semantic utterance classification. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2012, pp. 5045–5048. doi: 10.1109/ICASSP.2012.6289054.
[37]
Y. Wu, W. Wu, Z. Li, & M. Zhou. Topic augmented neural network for short text conversation. arXiv preprint. arXiv: 1605.00090v2, 2016.
[38]
E.I. Denton, S. Chintala, R. & Fergus. Deep generative image models using a laplacian pyramid of adversarial networks. In: Advances in neural information processing systems. pp. 1486–1494 (2015) Available at: http://papers.nips.cc/paper/5773-deep-generative-image-models-using-a-laplacian-pyramid-of-adversarial-networks.pdf.
[39]
K. Sohn, H. Lee, & X. Yan. Learning structured output representation using deep conditional generative models. In: Advances in neural information processing systems, 2015, pp. 3483–3491. Available at: http://papers.nips.cc/paper/5775-learning-structured-output-representation-using-deep-conditional-generative-models.pdf.
[40]
J.D. Williams, & G. Zweig. End-to-end lstm-based dialog control optimized with supervised and reinforcement learning. arXiv preprint. arXiv:1606.01269, 2016.
[41]
T.H. Wen, M. Gasic, N. Mrksic, P.H. Su, D. Vandyke, & S. Young. Semantically conditioned lstm-based natural language generation for spoken dialogue systems. arXiv preprint. arXiv:1508.01745, 2015.
[42]
S.R. Bowman, L. Vilnis, O. Vinyals, A.M. Dai, R. Jozefowicz, & S. Bengio. Generating sentences from a continuous space. arXiv preprint. arXiv:1511.06349, 2015.
[43]
Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, & E. Hovy. Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016, pp. 1480–1489.
[44]
C. Kamm. User interfaces for voice applications. In: Proceedings of the National Academy of Sciences 92(22), 1995, pp. 10031–10037. doi: 10.1073/pnas.92.22.10031.
[45]
G. Mesnil, X. He, L. Deng, & Y. Bengio. Investig ation of recurrent-neuralnetwork architectures and learning methods for spoken language understanding. In: Interspeech, 2013, pp. 3771–3775.
Article and author information
Cite As
P. Wang, H. Jiang, J. Xu & Q. Zhang. Knowledge graph construction and applications for Web search and beyond. Data Intelligence 1(2019), 333-349. doi: 10.1162/dint_a_00019
Peilu Wang
All of the authors contributed equally to the work. P. Wang (wangpeilu@sogou-inc.com) and H. Jiang (jianghao216568@sogou-inc.com) mainly drafted the paper, while P. Wang summarized the construction part and H. Jiang summarized the application part.All the authors have made meaningful and valuable contributions in revising and proofreading the resulting manuscript.
Peilu Wang is a researcher at Sogou. He received both his Master’s Degreeand Bachelor’s Degree from Shanghai Jiao Tong University. He currentlyworks on explainable entity recommendation and his main research interestsinclude natural language processing, knowledge graph and informationretrieval.
Hao Jiang
All of the authors contributed equally to the work. P. Wang (wangpeilu@sogou-inc.com) and H. Jiang (jianghao216568@sogou-inc.com) mainly drafted the paper, while P. Wang summarized the construction part and H. Jiang summarized the application part.All the authors have made meaningful and valuable contributions in revising and proofreading the resulting manuscript.
Hao Jiang is a Researcher at Sogou. He currently works on question answeringand entity linking. He received his Master’s Degree from Nanjing Universityin 2014, and Bachelor’s Degree from Nanjing University in 2011. His mainresearch interests include natural language processing, knowledge graph andchatbot.
Jingfang Xu
All of the authors contributed equally to the work. J. Xu (xujingfang@sogou-inc.com) is the leader ofSogou Knowledge Graph, who drew the blueprint of the whole system. All the authors have made meaningful and valuable contributions in revising and proofreading the resultingmanuscript.
Dr. Jingfang Xu is a Vice President at Sogou, Inc. She received her doctoratedegree from Tsinghua University. She has rich experience in informationretrieval, data mining and natural language processing. As the head of thesearch engine division, Dr. Xu has made outstanding achievements in Websearch, question answering, computer vision, machine translation and otherrelated fields.
Qi Zhang
All of the authors contributed equally to the work. Q. Zhang (qizhang@sogou-inc.com) brought valuable insights and information to the construction and applications of the knowledgegraph. All the authors have made meaningful and valuable contributions in revising and proofreading the resultingmanuscript.
qizhang@sogou-inc. com
Qi Zhang is the chief researcher in Sogou Inc. His major interests includenature language processing and information retrieval. He has published morethan 70 papers in the related areas. He received his Bachelor’s Degree inComputer Science and Technology, Shandong Univeristy in 2003. His PhD Degree in Computer Science was received from Fudan Univerisity in 2009.
0000-0003-0947-4942
Publication records
Published: Oct. 25, 2019 (Versions1
References
Data Intelligence