NLPCC 2024 will follow the NLPCC tradition of holding several shared tasks in natural language processing and Chinese computing. This year’s shared tasks focus on both classical problems and newly emerging problems, including Visual Chinese Character Checking, Nominal Compound Chain Extraction, Dialogue-level Coreference Resolution and Relation Extraction, Chinese Essay Discourse Logic Evaluation and Integration, Argument Mining for Chinese Argumentative Essay, Scientific Literature Survey Generation, Multilingual Medical Instructional Video Question Answering Challenge, Multi-Modal Entity Set Expansion, Metaphor Generation and Regulating Large Language Models.
Participants from both academia and industry are welcomed. Each group can participate in one or multiple tasks. The participants will be invited to submit papers to the main conference and the accepted papers will appear in the conference proceedings published by Springer LNCS.
The top 3 participating teams of each task will be certificated by NLPCC and CCF-NLP. If a task has multiple sub-tasks or tracks, only the top 1 participating team of each sub-task/track will be certificated.
There are ten shared tasks in this year’s NLPCC conference and the detailed description of each task can be found in the task guidelines released later. Here we provide a brief overview of each task.
◇ Task 1 - Visual Chinese Character Checking
In the real world where handwriting occupies the vast majority, characters that humans get wrong include faked characters (i.e., untrue characters created due to writing errors) and misspelled characters (i.e., true characters used incorrectly due to spelling errors), as illustrated in Figure 1. Visual Chinese Character Checking task aims at detecting and correcting wrong characters in the given text on an image in real-world scenarios. This task focuses on correcting characters based on a human-annotated Visual Chinese Character Checking dataset with faked and misspelled Chinese characters.
Organizer: Tsinghua University
Contact: Haitao Zheng (zheng.haitao@sz.tsinghua.edu.cn), Yinghui Li (liyinghu20@mails.tsinghua.edu.cn), Zishan Xu (xzs23@mails.tsinghua.edu.cn)
◇ Task 2 - Nominal Compound Chain Extraction
Given a Chinese document, the task aims to cluster all nominal compounds into several distinct chains, where a nominal compound is defined as a span containing several contiguous nouns in the text. The formulation of the task is similar to coreference resolution, yet the clustering rules during the annotation process are based on semantic relatedness rather than coreference. We evaluate the model's effectiveness by comprehensively considering its performance on both the nominal compound detection and chain extraction tasks.
Organizer: National University of Singapore
Contact: Hao Fei (haofei37@nus.edu.sg), Bobo Li
◇ Task 3 - Dialogue-level Coreference Resolution and Relation Extraction
Dialogue Relation Extraction (DRE) focuses on the relations between arguments in a conversation, which can help better understand the interaction between interlocutors. The phenomenon of reference is widely present in natural language, especially in conversations. Dialogue coreference resolution can provide more accurate contextual understanding for DRE task. We introduce dialogue level coreference resolution and relation extraction task, and evaluate the model's ability to recognize dialogue level coreference information and extract relations using coreference information.
Organizer: Wuhan University
Contact: Yiyun Xiong (yiyunxiong@whu.edu.cn), Fei Li, Bobo Li, Hao Fei, Chong Teng, Donghong Ji
◇ Task 4 - Chinese Essay Discourse Logic Evaluation and Integration
In the realm of middle and high school examination scoring, essay evaluation is notably challenging, time-consuming, and contentious. With skepticism surrounding the traditional methods' accuracy and efficiency, there's a growing interest in machine scoring as an alternative. While significant advancements in automatic essay assessment technology have been made, they predominantly focus on linguistic elements, often neglecting the crucial role of discourse logic in determining essay quality. Recognizing this gap, East China Normal University's CubeNLP laboratory and Microsoft Research Asia developed the CEDCC dataset to enrich automatic essay evaluation by emphasizing logical structure. The "Discourse Logic Evaluation and Integration Task" competition aims to enhance essay scoring's accuracy and efficiency by exploring essays' discourse logic, encouraging innovative approaches using the CEDCC dataset to advance automatic essay assessment technology.
This shared task includes three tracks:
● Track 1: Discourse Logic Error Detection (DLED). Given a middle school student's essay, annotators are required to identify whether each sentence contains logical errors, including improper use of connective words, lack of logical relation between contexts, unreasonable sentence breaks, deviation from the theme, etc. This task aims to uncover the impact of these fine-grained factors on the coherence of the essay.
● Track 2: Topic Coherence Modeling (TCM). The task involves identifying the topic sentence of each paragraph and evaluating the logical coherence between these topic sentences in a middle school student's essay. Annotators are required to identify the topic sentence of each paragraph and determine which sentence best represents the central idea of the entire text (the main topic sentence). Subsequently, they assess the logical relationship between adjacent topic sentences, analyzing how they interconnect to support the overall structure of the article.
● Track 3: Discourse Coherence Feedback Generation (DCFG). Annotators assess a middle school essay's discourse coherence, focusing on four key areas: understanding the essay's structure and flow (Discourse Analysis), identifying main arguments and details (Topic Extraction), evaluating argument logic and relevance (Logic Detection), and providing an overall coherence rating (Coherence Rating). The feedback should be specific, constructive, and aimed at guiding students to improve their writing skills.
Organizer: East China Normal University, Microsoft Research Asia
Contact: Yuhao Zhou (yhzhou@stu.ecnu.edu.cn), Hongyi Wu, Xinshu Shen, Man Lan, Yuanbin Wu, Xiaopeng Bai, Shaoguang Mao, Tao Ge, Yan Xia
◇ Task 5 - Argument Mining for Chinese Argumentative Essay
Argumentative essay writing is an important way to promote the development and enhancement of students' higher-order thinking skills at the high school level, and is an important form of discursive expression. By mining students' arguments in their argumentative essays, students are better able to verify the reliability of data, and identify reasonable reasoning processes to improve their logical expression skills. At the same time, teachers can assess the effectiveness of their teaching methods by analyzing the level of students' scientific argumentation, and adjust the course design and teaching methods according to the evaluation results to improve the quality of education. Argument components identification for argumentative essays is important for the assessment of argumentative essays. This task takes the whole argumentative essay as an input unit and categorizes each sentence in it into 4 categories of assertion, evidence, elaboration, and others at a coarse-level. Considering the variety of argumentative techniques used in students' compositions, this task further categorized assertion into main claim, claim and restate claim, and further classify evidence into fact, anecdote, quotation, proverb and axiom as well.
Organizer: East China Normal University
Contact: Zheqin Yin (chaos_yin@qq.com), Yupei Ren, Man Lan, Xiaopeng Bai, Yuanbin Wu, Aimin Zhou
◇ Task 6 - Scientific Literature Survey Generation
Given a list of topic related references, the task aims to develop models for long form literature survey generation. We provide a training set of randomly crawled arXiv survey papers together with their references, participants are expected to build retrieval augmented generation models for the survey generation. Both automatic and human evaluation will be conducted to measure the survey outline and content quality. There is no limit to the use of pre-trained large language models, however reverse engineering is not allowed.
Organizer: Kexin Technology, Deakin University
Contact: Yangjie Tian (rd_kexin@cnpiec.com.cn), Aijia Li, Ran Liu, Xungang Gu, He Zhang, Ming Liu
◇ Task 7 - Multilingual Medical Instructional Video Question Answering Challenge
Recently, the advent of online videos has revolutionized the way in which information or knowledge is obtained. Many people find instructional videos an effective and efficient way to teach or learn how to complete a specific task through a series of step-by-step procedures. In this context, a new task called Temporal Answering Grounding in Video (TAGV) is proposed to find the video frame span (visual answer) corresponding to the input question. However, the current TAGV dataset was created for a single language (i.e., English or Chinese), it is also important to include the development of multilingual applications. To promote research on TAGV, we propose the Multilingual Medical Instructional Video Question Answering (MMIVQA) challenge. The datasets for this challenge contain high-quality Chinese and English medical instructional videos, with manual annotations provided by medical experts. The ultimate goal of this joint task is to develop a system that can provide a multilingual Q&A function with moment-to-moment video clips for first aid, medical emergencies or medical education.
This shared task includes three tracks:
● Track 1: Multilingual Temporal Answering Grounding in Single Video (mTAGSV).
● Track 2: Multilingual Video Corpus Retrieval, mVCR).
● Track 3: Multilingual Temporal Answering Grounding in Video Corpus (mTAGVC).
Organizer: HuNan University, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, and Hunan Normal University
Contact: Bin Li (libincn@hnu.edu.cn), Yixuan Weng, Qiya Song
◇ Task 8 - Multi-Modal Entity Set Expansion
The Multi-Modal Entity Set Expansion task aims to expand a small set of seed entities with new entities belonging to the same fine- grained semantic class. To achieve this objective, the model must effectively leverage textual and visual modal information to comprehend entities and grasp the fine-grained semantic classes of the initial seed entity set.
Organizer: Tsinghua University
Contact: Haitao Zheng(zheng.haitao@sz.tsinghua.edu.cn), Tingwei Lu (ltw23@mails.tsinghua.edu.cn), Yangning Li (yn-li23@mails.tsinghua.edu.cn)
◇ Task 9 - Metaphor Generation
This task is designed to generate Chinese metaphors using machine learning techniques by either effectively identifying the ground or vehicle in the metaphoric relation.
This shared task includes two tracks:
● Track 1: Ground Identification: In this track, the objective is to generate a metaphor given a pair of TENOR and VEHICLE. The model should identify and produce a metaphor that effectively connects and compares the subject (TENOR) with the object (VEHICLE).
● Track 2: Vehicle Identification: The second track focuses on creating a metaphor from a provided TENOR and a potential GROUND, which represents the characteristics of the TENOR that the metaphor aims to highlight. The model should produce a metaphor that emphasizes these features.
Organizer: Hong Kong University of Science, Massachusetts Institute of Technology, University of Manchester, University of California, San Diego
Contact: Xinwei Qu, Xinrong Yao, Ge Zhang (gezhang@umich.edu), Jie Fu (jiefu@ust.hk), Chenghua Lin, Yujie Shao (yus085@ucsd.edu)
◇ Task 10 - Regulating Large Language Models
The rise of large language models has brought about significant advancements in the field of natural language processing. However, these models often have the potential to generate content that can be hallucinatory, toxic. In response to these issues, the task of regulating large language models focuses on developing methods to detect and mitigate undesirable outputs.
This shared task includes two tracks:
● Track 1: Multimodal Hallucination Detection for Multimodal Large Language Models (Closed-Source Model Group): Develop methods to identify and flag hallucinatory outputs that do not correlate with reality or the given input context when dealing with multimodal prompts (text, images, video, etc.). This track would involve creating detection algorithms that can discern between accurate and hallucinated responses across different modalities, thereby ensuring the reliability of the model's outputs. Closed-source LLMs (e.g., GPT-4) can be used in the pipeline.
● Track 2: Multimodal Hallucination Detection for Multimodal Large Language Models (Open-source Model Group): Develop methods to identify and flag hallucinatory outputs that do not correlate with reality or the given input context when dealing with multimodal prompts (text, images, video, etc.). This track would involve creating detection algorithms that can discern between accurate and hallucinated responses across different modalities, thereby ensuring the reliability of the model's outputs. Closed-source models cannot be used in the pipeline.
● Track 3: Detoxifying Large Language Models: Design and implement strategies to prevent large language models from generating toxic, biased, or harmful content. This track would focus on developing filters, fine-tuning techniques, or other mechanisms to recognize and suppress malicious response before it reaches the user. The goal is to maintain the utility and fluency of the model while ensuring that the content it produces adheres to community guidelines and ethical standards.
Organizer: Zhejiang University, National University of Singapore
Contact: Ningyu Zhang, Xiang Chen, Mengru Wang (mengruwg@zju.edu.cn), Chenxi Wang, Ziwen Xu, Shumin Deng
◇ Task 1 - Visual Chinese Character Checking
Please fill out the Shared Task 1 Registration Form (Word File) and send it to the following registration email.
Registration Email: liyinghu20@mails.tsinghua.edu.cn
◇ Task 2 - Nominal Compound Chain Extraction
Please fill out the Shared Task 2 Registration Form (Word File) and send it to the following registration email.
Registration website: https://ncce-site.pages.dev/
Registration Email: haofei37@nus.edu.sg
◇ Task 3 - Dialogue-level Coreference Resolution and Relation Extraction
Please fill out the Shared Task 3 Registration Form (Word File) and send it to the following registration email.
Registration Email: yiyunxiong@whu.edu.cn
◇ Task 4 - Chinese Essay Discourse Logic Evaluation and Integration
Please fill out the Shared Task 4 Registration Form (Word File) and send it to the following registration email.
Contact Email: yhzhou@stu.ecnu.edu.cn
◇ Task 5 - Argument Mining for Chinese Argumentative Essay
Please fill out the Shared Task 5 Registration Form (Word File) and send it to the following registration email.
Registration Email: chaos_yin@qq.com
◇ Task 6 - Scientific Literature Survey Generation
Please fill out the Shared Task 6 Registration Form (Word File) and send it to the following registration email.
Registration Email: rd_kexin@cnpiec.com.cn
◇ Task 7 - Multilingual Medical Instructional Video Question Answering Challenge
Please fill out the Shared Task 7 Registration Form (Word File) and send it to the following registration email.
Registration website: https://cmivqa.github.io/
Registration Email: libincn@hnu.edu.cn
◇ Task 8 - Multi-Modal Entity Set Expansion
Please fill out the Shared Task 8 Registration Form (Word File) and send it to the following registration email.
Registration Email: ltw23@mails.tsinghua.edu.cn
◇ Task 9 - Metaphor Generation
Please fill out the Shared Task 9 Registration Form (Word File) and send it to the following registration email.
Registration Email: gezhang@umich.edu
◇ Task 10 - Regulating Large Language Models
Please fill out the Shared Task 10 Registration Form (Word File) and send it to the following registration email.
Registration Email: mengruwg@zju.edu.cn
2024/03/25:announcement of shared tasks and call for participation;
2024/03/25:registration open;
2024/04/15:release of detailed task guidelines & training data;
2024/05/25:registration deadline;
2024/06/11:release of test data;
2024/06/20:participants’ results submission deadline;
2024/06/30:evaluation results release and call for system reports and conference paper;
2024/07/20:conference paper submission deadline (only for shared tasks);
2024/08/09:conference paper accept/reject notification;
2024/08/23:camera-ready paper submission deadline.
The evaluation papers are English only. The papers will be in the proceedings of the NLPCC-2024 conference (for English) which will be published as a volume in the Springer LNAI series (EI & ISTP indexed, English papers). Submissions should follow the LNCS formatting instructions. The maximum paper length is 12 pages, including references; The submissions must therefore be formatted in accordance with the standard Springer style sheets ([LaTeX][Microsoft Word]). All submissions should be prepared in the PDF format.
Submissions must be made electronically via the OpenReview platform at https://openreview.net/group?id=ccf.org/NLPCC/2024/Shared_Tasks. Note that submissions sent by email will not be considered.
Xiaopeng Bai, East China Normal University
Yunbo Cao, PLA Academy of Military Science
Xiang Chen, Zhejiang University
Shumin Deng, National University of Singapore
Hao Fei, National University of Singapore
Jie Fu, Hong Kong University of Science and Technology
Tao Ge, Microsoft Research Asia
Xungang Gu, Kexin Technology
Donghong Ji, Wuhan University
Man Lan, East China Normal University
Aijia Li, Kexin Technology
Bin Li, Hunan University
Bobo Li, Wuhan University
Fei Li, Wuhan University
Yangning Li, Tsinghua University
Yinghui Li, Tsinghua University
Chenghua Lin, University of Manchester
Ming Liu, Deakin University
Ran Liu, Kexin Technology
Tingwei Lu, Tsinghua University
Shaoguang Mao, Microsoft Research Asia
Xingwei Qu, Hong Kong University of Science and Technology
Yupei Ren, East China Normal University
Yujie Shao, University of California, San Diego
Xinshu Shen, East China Normal University
Qiya Song, Hunan Normal University
Chong Teng, Wuhan University
Yangjie Tian, Kexin Technology
Chenxi Wang, Zhejiang University
Mengru Wang, Zhejiang University
Yixuan Weng, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
Hongyi Wu, East China Normal University
Yuanbin Wu, East China Normal University
Yan Xia, Microsoft Research Asia
Yiyun Xiong, Wuhan University
Zishan Xu, Tsinghua University
Ziwen Xu, Zhejiang University
Xinrong Yao, Massachusetts Institute of Technology
Zheqin Yin, East China Normal University
Ge Zhang, Hong Kong University of Science and Technology
He Zhang, Kexin Technology
Meishan Zhang, Harbin Institute of Technology (Shenzhen)
Ningyu Zhang, Zhejiang University
Hai-Tao Zheng, Tsinghua University
Aimin Zhou, East China Normal University
Yuhao Zhou, East China Normal University