NLPCC 2021 Tutorials

Abstract : As more and more structured data become available on the web, the question of how end users can access this body of knowledge becomes of crucial importance. As a de facto standard of a knowledge base, RDF repository is a collection of triples. Although SPARQL is a standard way to access RDF data, it remains tedious and difficult for end users because of the complexity of the SPARQL syntax and the RDF schema. An ideal system should allow end users to profit from the expressive power of Semantic Web standards (such as RDF and SPARQLs) while at the same time hiding their complexity behind an intuitive and easy-to-use interface.
In this talk, I first review two categories of existing methods on natural language question answering (Q/A) over RDF knowledge graph --- one is IR (Information Retrieval)-based and the other one is called semantic parsing method. Besides, the game changer --- pre-trained language model have made great progress in NLP, including KBQA. I will also briefly review its development and impact on KBQA task.
Then, I will talk about our KBQA system (gAnswer), which is based on graph matching-based technique. Our method constructs semantic query graph to represent semantics of natural language question and answer, and then utilize subgraph matching technique to address the disambiguation issue in natural language understanding and query the answer from KB. gAnswer has achieved excellent evaluation results on multiple benchmarks, and won the championship in QALD-9 knowledge graph QA system competition (hosted by HOBBIT). We share relevant source codes at Github https://github.com/pkumod/gAnswer

Speaker: Lei Zou is a professor in WangXuan Institute of Computer Technology of Peking University, the assistant to the director of Center for Data Science of Peking University, and a faculty member of National Engineering Laboratory for Big Data Analysis Technology and Application (PKU), China. Prof. Lei Zou’s recent research interests include graph databases, knowledge graph, particularly in graph-based RDF data management, natural language question answering over knowledge graph and graph analytics and machine learning and big data systems. He has published more than 50 papers, including more than 30 papers published in reputed journals and major international conferences, such as SIGMOD, VLDB, ICDE, TODS, TKDE, VLDB Journal, AAAI, EMNLP. Lei Zou’s research is supported by NSFC-Young Excellent Talent Project and National Key Research and Development Program of China. Furthermore, Prof. Zou obtained Newton Advanced Fellowships of UK Royal Society

Lecture 2

Time: 10:45-12:15, 13th October, 2021

Title: Learning towards Knowledge-driven Dialogue System between Human and Computer

Abstract : Recently, intelligent dialogue techniques between human and computer have been improved significantly. A series of industrial applications have been developed along with the development of dialogue systems, such as Microsoft XiaoIce, AliMe from Alibaba, and Amazon Echo, etc. Intelligent human-computer dialogue systems have provided a huge space for future applications which may be beyond our imagination. However, we are still facing with a bottleneck of further progress of dialogue systems: the lack of knowledge usually leads to inadequate and awkward conversations with low intelligence. In this tutorial, we will review the current techniques to incorporate knowledge into the human-computer dialogue systems in order to make the conversations more natural and more human-like.

Speaker: Dr. Rui Yan is an associate professor with tenure at Gaoling School of Artificial Intelligence, Renmin University of China. He was a tenure-track assistant professor at Wangxuan Institute of Computer Technology, Peking University, and a senior researcher at Baidu Inc. He was selected as a young scientist at Beijing Academy of Artificial Intelligence (BAAI) and a startrack young fellow of Microsoft Research Asia (MSRA). Till now he has published more than 100 highly peer-reviewed publications with more than 5,000 citations. He regularly served as an area chair/senior PC member for top-tier international conferences. He has been invited to give tutorial talks for these conferences as well.

Lecture 3

Time: 14:00-15:30, 13th October, 2021

Title: Group-Theoretic Self-Supervised Representation Learning

Abstract : A good visual representation is an inference map from observations (images) to features (vectors) that faithfully reflects the structure and transformations of the underlying generative factors (semantics), who are invariant to environmental changes. In this paper, we formulate the notion of ``good'' representation from a group-theoretic view using Higgins' definition of disentangled representation, and show that existing Self-Supervised Learning (SSL) can only learn augmentation-related features such as lighting and view shifts, leaving the rest of high-level semantics entangled. To break the limitation, we propose an iterative SSL method: Iterative Partition-based Invariant Risk Minimization (IP-IRM), which successfully grounds the abstract group actions into a concrete SSL optimization. At each iteration, IP-IRM first partitions the training samples into subsets. In particular, the partition reflects an entangled semantic group action. Then, it leverages IRM to learn subset-invariant sample similarities, where the invariance guarantees to disentangle the corresponding semantic. We prove that IP-IRM converges with a full-semantic disentangled representation, and show its effectiveness on various feature disentanglement and SSL benchmarks.

Speaker: Hanwang Zhang is an Assistant Professor at Nanyang Technological University's School of Computer Science and Engineering. His research interests include Computer Vision, Natural Language Processing, Causal Inference, and their combinations. His work has received numerous awards including the IEEE AI’s-10-To-Watch 2020, TMM Prize Paper Award 2020, Alibaba Innovative Research Award 2019, ACM ToMM Best Paper Award 2018, Nanyang Assistant Professorship 2018, ACM SIGIR Best Paper Honourable Mention Award 2016, and ACM MM Best Student Paper Award 2012. Hanwang and his team work actively in causal inference for connecting vision and language. For example, their scene graph detection benchmark won the IEEE CVPR Best Paper Finalist 2019 and their visual dialog agent won the 1st place in Visual Dialog Challenge 2019 and 2nd place in 2018/2020.

Lecture 4

Time: 15:45-17:15, 13th October, 2021

Title: OOD Sample and New Class under Weakly Supervised Scenario

Abstract : Machine learning algorithms tend to fail when the training and test data contain examples from unknown distribution, e.g., out-of-distribution (OOD) sample or sample from new class, which becomes one major challenge to deploy machine learning models in real-world tasks. Previous studies mainly focused on supervised or unsupervised scenarios, while the efforts on weakly supervised scenarios remain to be limited. In this talk, we briefly introduce some recent research on weakly supervised learning suffering from samples from unknown distribution. Firstly, we present some efforts on robust weakly supevised learning, and then some more accurate semi-supervised and label noise learning algorithm affected by OOD samples, respectively. Then, we present two attempts on detecting examples from new class under streaming data. Experimental results verify the superiority of our proposed approaches, and reveal the possible research direction in the future.

Speaker: Yu-Feng Li is an associate professor of the National Key Laboratory for Novel Software Technology, Nanjing University. He received the BSc and PhD degrees in computer science from Nanjing University, China, in 2006 and 2013, respectively. His research interests include semi-supervised learning, weakly supervised learning, and optimization. He has published more than 50 papers in top-tier journals and conference proceedings. He serves as associate editor of the Machine Learning, Neural Network, etc. He serves as program co-chair of IEEE Bigcomp 2020, CCML 2021, MLA 2020, journal track co-chair of ACML 2021, workshop co-chair of ACML 2018, tutorial co-chair of ACML 2019, etc, and area chair/senior pc member of ICML, IJCAI, AAAI, ACML, PAKDD, etc.

Lecture 5

Time: 09:00-10:30, 14th October, 2021

Title: Will Non-Autoregressive Translation Dominate Neural Machine Translation?

Abstract : Non-autoregressive translation (NAT) has shown its superiority in translation speed and is drawing more attention recently, but it also faces some problems, among which multimodality is the most prominent. Multimodality is the inherent problem of NMT and is further aggravated in NAT due to lacking sequential modeling. In this tutorial, we will review the development of NAT modeling, explore the problems of NAT and give our solution to multimodality.

Speaker: Yang Feng is a Professor in Institute of Computing Technology, Chinese Academy of Sciences (ICT/CAS). She was picked into “New Baixing Talent Introduction Program” of ICT/CAS in 2017 and was selected as one of the outstanding members of CCAI in 2019 (70 people in total). Her research focuses on machine translation and dialogue and her work on machine translation has won Best Long Paper Award of ACL 2019. She has served as (Senior) Area Co-chair for top-tier conferences of NLP (e.g., ACL/EMNLP) and an Action Editor of ACL rolling review. She is a recipient of Technical Committee on NLP "Young Scientist".

Lecture 6

Time: 10:45-12:15, 14th October, 2021

Title: Self-supervised Reinforcement Learning

Abstract : Reinforcement learning has achieved great success in recent years but still suffers from sample efficiency and scalability issues, which significantly limits the wide application of RL to real application scenarios. One of major bottleneck of RL is the limited representation power in terms of both environments and policies. In this talk, I will introduce how to leverage self-supervised techniques to increase the representation power of RL from different aspects such as state, policy, action, and finally boost the learning efficiency and scalability across different scenarios and tasks.

Speaker: Dr. Jianye Hao is Associate Professor at Tianjin University and Director of Huawei Noah's Ark Decision-making and Reasoning Lab. His research area focuses on reinforcement learning and multiagent systems. Dr. Hao has published over 100 peer-reviewed papers in top conferences and journals and won a number of best paper awards such as ASE2019, DAI2019 and CoRL2020. The research of his team has been successfully applied in various domains such as Game AI, E-commerce recommendation, network optimization, supply chain optimization and so on.

Lecture 7

Time: 14:00-15:30, 14th October, 2021

Title: Generative Adversarial Nets Meet Reinforcement Learning

Abstract : Generative adversarial nets (GANs) have been widely studied during the recent development of deep learning and unsupervised learning. With an adversarial training mechanism, GAN manages to train a generative model to fit the underlying unknown real data distribution under the guidance of the discriminative model estimating whether a data instance is real or generated. Such a framework is originally proposed for fitting continuous data distribution such as images, thus it is not straightforward to be directly applied to information retrieval scenarios where the data is mostly discrete, such as IDs, text and graphs. In this talk, I will focus on discussing the GAN techniques and the variants on discrete data modeling with reinforcement learning techniques. Specifically, I will start the talk with the fundamentals of GAN framework and its extension on discrete data generation via reinforcement learning. Then I will discuss about the connection between GAN and imitation learning. Finally, I will discuss GAN on discrete data generation tasks, including IRGAN, SeqGAN, GraphGAN, CommunityGAN etc, and some advanced applications.

Speaker: Weinan Zhang is now an associate professor at Shanghai Jiao Tong University. His research interests include reinforcement learning, deep learning and data science with various real-world applications of recommender systems, search engines, text mining and generation, game AI etc. He has published over 100 research papers on international conferences and journals and has been serving as a (senior) PC member at ICML, NeurIPS, ICLR, KDD, AAAI, IJCAI, SIGIR etc. and a reviewer at JMLR, TOIS, TKDE, TIST etc. He was granted as ACM Rising Star Award 2017 and Alibaba DAMO Young Scholar Award 2018. His research won the Best Paper Honorable Mention Award at SIGIR 2017, the Best Paper Award at DLP-KDD Workshop 2019 and the Best System Paper Award at CoRL 2020. Weinan earned his Ph.D. from the Computer Science Department of University College London in 2016 and B.E. from ACM Class of Shanghai Jiao Tong University in 2011.

Lecture 8

Time: 15:45-17:15, 14th October, 2021

Title: Robustness in Natural Language Processing

Abstract : The recent breakthroughs in deep learning theory and technology provide strong support for the wide application of NLP technology, such as question answering systems, information extraction, and machine translation. A large number of models have emerged, of which the performances surpass that of humans when the training and test data are independent and identically distributed (i.i.d.). However, recent research had reported that highly accurate deep neural networks (DNN) can be vulnerable to carefully crafted adversarial examples distribution shift, data transformation, and shortcut learning. Recently, researchers have begun to explore ways to detect robustness prior to model deployment. In this talk, we will give some brief introduction about recent studies on robustness in NLP.

Speaker: Qi Zhang is a professor in the school of computer science at the Fudan University. His Dr. degree in Computer Science was received from Fudan Univerisity, in 2009. Since April 2009, he has been with the School of Computer Science, Fudan University. His research focuses on natural language processing , with special emphasis on roubustness, information extraction, sentiment analysis, question answering, and so on. He has published more than 100 papers on top-tier conferences in related research fields, including ACL, EMNLP, ICML, ICLR, NIPS, IJCAI, AAAI and so on. His recent research won the WSDM Best Paper Runner Up Award, CoLING Area Chair Feavour Favorites Award. He is a recipient IBM faculty Award and ACM Shanghai Rising Star Award.

Speaker: Dr. Tao Gui is an pre-tenured associate professor in Institute of Modern Languages and Linguistics at Fudan University. He also got his Phd at Fudan University in 2021. His research focuses on natural language processing, information extraction robust model, and model interpretability. He has published more than 20 papers on top-tier conferences in related research fields, including ACL, EMNLP, SIGIR, IJCAI, AAAI and so on.

Organizer

Hosts

Publishers

Special Technical Sponsor

Diamond Sponsors

Platinum Sponsors




Golden Sponsors

Silver Sponsors