NLPCC 2022 will follow the NLPCC tradition of holding several shared tasks in natural language processing and Chinese computing. This year’s shared tasks focus on both classical problems and newly emerging problems, including Question Answering with Knowledge Models, Speech Entity Linking, Multimodal Product Summarization, Multimodal Dialogue Understanding and Generation, ‘Multi-label Classification, NER, Content Extraction for Scientific Literature’, ‘Dialogue Text Analysis, Topic Extraction and Dialogue Summary’ and Fine-Grain Dialogue Social Bias Measurement.
Participants from both academia and industry are welcomed. Each group can participate in one or multiple tasks. The participants will be invited to submit papers to the main conference and the accepted papers will appear in the conference proceedings published by Springer LNCS.
The top 3 participating teams of each task will be certificated by NLPCC and CCF-NLP. If a task has multiple sub-tasks or tracks, then only the top 1 participating team of each sub-task/track will be certificated.
There are seven shared tasks in this year’s NLPCC conference and the detailed description of each task can be found in the task guidelines released later. Here we provide a brief overview of each task.
◇ Task 1 - Question Answering with Knowledge Models
Given a pretrained language model and a domain-specific text, the competitors continuously train the language model to make it to comprehend the knowledge in the text. In the test stage, questions about the knowledge in the given text are asked to the models submitted by the competitors, and the correct answers are expected from the text generated by the models. Nothing about questions is available to the competitors as training data before the test stage. At inference time, answer generation should purely depend on the language model, without any accessing to the domain-specific text.
Organizer: IDEA-CCNL
Contact: Jiaxing ZHANG (ccnl_public@idea.edu.cn)
◇ Task 2 - Speech Entity Linking
Speech Entity Linking, meaning recognizing and disambiguating named entities in speech to a knowledge base (KB), is a challenging task. This task aims to study entity linking methods for spoken languages as well as the fusion of speech and text modality.
This speech entity linking task includes two tracks:
● Track 1. Entity Recognition and Disambiguation: processes a piece of utterance and the corresponding transcript to extract the entities (i.e. Named Entity Recognition) and then disambiguate these extracted entities to the correct entry in a given knowledge base.
● Track 2. Entity Disambiguation-Only: directly takes gold standard named entities as input and only disambiguates them to the correct entry in a given knowledge base.
Organizer: School of Computer Science and Technology, Beijing Institute of Technology, and Wangxuan Institute of Computer Technology, Peking University
Contact: Ruoyu SONG (songruoyu@bit.edu.cn)
◇ Task 3 - Multimodal Product Summarization
This task aims to generate a product summary given multimodal heterogeneous information, including a piece of product textual description composed by a brief product title and a long product description, a product knowledge base containing information about product attributes, and a product image.
Despite explosive development of text summarization technique, especially for the pre-training-based models, there are still many difficulties and challenges to be solved. First, the appearance of a product plays an important role for a good first impression, which requires the summarizer can effectively fuse the textual and visual information. Second, unfaithful product summaries, i.e., producing wrong attributes that are inconsistent with the original product, mislead the users and decrease public credibility of the e-commerce platform. Thus, the faithfulness is the bottom line for product summarization system. Third, to arouse consumers' attention and desire for purchase, the generated summary should mention the most distinctive characteristics of the product. The team participating in the shared task are expected to solve these challenges.
Organizer: JD AI Research
Contact: Haoran LI (lihaoran24@jd.com)
◇ Task 4 - Multimodal Dialogue Understanding and Generation
The multimodal dialogue understanding and generation task can be divided into two phases: multimodal context understanding and response generation. Specifically, the former includes dialogue session identification (i.e., determining whether the dialogue content has changed) and dialogue scene identification (i.e., determining whether the video context has changed). The ultimate goal is to generate a response that is coherent to the dialogue context and relevant to the video context. This task includes three tracks:
● Track 1: Dialogue scene identification: predict the boundaries of different dialogue scenes given a set of continuous dialogue utterances and a related video.
● Track 2: Dialogue session identification: predict the boundaries of different dialogue sessions given a set of continuous dialogue utterances and a related video (which is identical to Track 1).
● Track 3: Dialogue response generation: generate a response based on scene and session predictions, while coherently catching up with the conversation.
Organizer: Wangxuan Institute of Computer Technology, Peking University
Contact: Xueliang ZHAO (xl.zhao@pku.edu.cn)
◇ Task 5 - Multi-label Classification, NER, Content Extraction for Scientific Literature
Managing and Exploring scientific literature efficiently becomes more and more important for researchers and etc. due to the growing large number of publications. Owning to the development of artificial intelligence (AI) and natural language processing (NLP) technology, literature management and exploration can be more efficiently by using AI-based literature classification, literature search and recommendation as well as PDF content extraction functions. Aiming at solving the related challenging fundamental problems, we set three tracks in this task (with three datasets provided):
● Track 1: Multi-label Classification Model for English Scientific Literature: develop a multi-label classification model for scientific research literature based on the given metadata (title and abstract) of scientific research literature and corresponding hierarchical labels in a specific domain.
● Track 2: Supervised Named Entity Recognition Model for English Domain-specific Texts: develop a named entity recognition model for domain-specific texts based on state-of-the-art NLP and deep learning technique with the labelled domain-specific sentences corresponding to seven entity types.
● Track 3: Detection and Extraction of Inline and Isolated Mathematical Expressions Model for English PDF Articles: design and develop a model that can automatically recognize and extract the inline and isolated mathematical expressions from the provided English PDF articles without using any commercial third-party APIs.
Organizer: CNPIEC KEXIN LTD and Data Intelligence
Contact: He ZHANG (rd_kexin@cnpiec.com.cn)
◇ Task 6 - Dialogue Text Analysis, Topic Extraction and Dialogue Summary
This task uses recorded texts in the real estate field as the corpus and aims to extract topics and generate dialogue summary.
● Track1. Dialogue Topic Extraction: aims to extract topics from dialogue text automatically and explore the hidden structures of dialogue text through latent topics.
● Track2. Dialogue text summary: In the dialogue domain, summarization is mainly used to refine the intentions and concerns of the interlocutor. Compared with traditional text summaries, dialogue text summaries have more diverse role information and complex contextual connections. This task aims to summarize the concerns of customers when purchasing real estate, so as to facilitate sales review and understand and grasp the core requirements of customers. The roles of this task are sales and customers, the input is in the form of multiple rounds of dialogue, and the output is a summary of the customer's concerns in this dialogue.
Organizer: AISpeech
Contact: Qingliang MIAO (qingliang.miao@aispeech.com)
◇ Task 7 - Fine-Grain Dialogue Social Bias Measurement
This task aims to measure the social bias in dialogue scenario. Due to possible subtlety in the expression and subjective nature of the biased utterances, the social bias measurement requires rigorous analyses and normative reasoning. Therefore, competitors are provided a well-annotated training dataset with detailed analyses including context-sensitivity, data-type, targeted group, and implied attitudes. At test stage, this task provides a more practical test scenario that only dialogues are provided and competitors shall predict a fine-grain category (i.e. irrelevant, anti-bias, neutral, and biased) w.r.t. dialogue social bias.
Organizer: The CoAI group, DCST, Institute for Artificial Intelligence, Tsinghua University, HCCL, The Chinese University of Hong Kong, and Huawei Noah’s Ark Lab
Contact: Jingyan ZHOU (jyzhou@se.cuhk.edu.hk)
◇ Task 1 - Question Answering with Knowledge Models
Please fill out the Shared Task 1 Registration Form (Word File) and send it to the following registration email.
Registration Email: ccnl_public@idea.edu.cn
◇ Task 2 - Speech Entity Linking
Please fill out the Shared Task 2 Registration Form (Word File) and send it to the following registration email.
Registration Email: songruoyu@bit.edu.cn
◇ Task 3 - Multimodal Product Summarization
Registration website: https://github.com/jd-aig/multimodal-product-summarization-challenge
◇ Task 4 - Multimodal Dialogue Understanding and Generation
Please fill out the Shared Task 4 Registration Form (Word File) and send it to the following registration email.
Registration Email: xl.zhao@pku.edu.cn
◇ Task 5 - Multi-label Classification, NER, Content Extraction for Scientific Literature
Please fill out the Shared Task 5 Registration Form (Word File) and send it to the following registration email.
Registration Email: rd_kexin@cnpiec.com.cn
◇ Task 6 - Dialogue Text Analysis, Topic Extraction and Dialogue Summary
Please fill out the Shared Task 6 Registration Form (Word File) and send it to the following registration email.
Registration Email: qingliang.miao@aispeech.com
◇ Task 7 - Fine-Grain Dialogue Social Bias Measurement
Please fill out the Shared Task 7 Registration Form (Word File) and send it to the following registration email.
Registration Email: jyzhou@se.cuhk.edu.hk
2022/03/15:announcement of shared tasks and call for participation;
2022/03/15:registration open;
2022/04/06:release of detailed task guidelines & training data;
2022/05/05:registration deadline;
2022/05/10:release of test data;
2022/05/20:participants’ results submission deadline;
2022/05/30:evaluation results release and call for system reports and conference paper;
2022/06/20:conference paper submission deadline (only for shared tasks);
2022/07/04:conference paper accept/reject notification;
2022/07/18:camera-ready paper submission deadline.
The evaluation papers are English only. The papers will be in the proceedings of the NLPCC-2022 conference (for English) which will be published as a volume in the Springer LNAI series (EI & ISTP indexed, English papers). Submissions should follow the LNCS formatting instructions. The maximum paper length is 12 pages, including references; The submissions must therefore be formatted in accordance with the standard Springer style sheets ([LaTeX][Microsoft Word]). Manuscripts should be submitted electronically through the submission website (https://www.softconf.com/nlpcc/eval-2022). Email submissions will not be accepted. Submissions should be in PDF format.
Yunbo CAO, Tencent
Zhe CHEN, CNPIEC KEXIN LTD
Jiawen DENG, The CoAI group, DCST, Institute for Artificial Intelligence, Tsinghua University
Ruyi GAN, IDEA-CCNL
Fujiang GE, AISpeech
Yuhang GUO, School of Computer Science and Technology, Beijing Institute of Technology
Minlie HUANG, The CoAI group, DCST, Institute for Artificial Intelligence, Tsinghua University
Haoran LI,JD AI Research
Fenghong LIU, Data Intelligence
Fei MI, Huawei Noah’s Ark Lab
Qingliang MIAO, AISpeech
Ruoyu SONG, School of Computer Science and Technology, Beijing Institute of Technology
Yuxuan WANG, Wangxuan Institute of Computer Technology, Peking University
Youzheng WU,JD AI Research
Ziwei WU, IDEA-CCNL
He ZHANG,CNPIEC KEXIN LTD
Jiaxing ZHANG, IDEA-CCNL
Dongyan ZHAO, Wangxuan Institute of Computer Technology, Peking University
Xueliang ZHAO, Wangxuan Institute of Computer Technology, Peking University
Jingyan ZHOU, HCCL, The Chinese University of Hong Kong