NLPCC 2025 will continue the tradition of previous NLPCC conferences by hosting several shared tasks in natural language processing and Chinese computing. This year, the shared tasks will focus on both classic problems and newly emerging challenges. These include LLM-Generated Text Detection, Evaluation of On-Topic Graded Comments for Essays, Comprehensive Argument Analysis for Chinese Argumentative Essays, Multi-modal, Multilingual, and Multi-hop Medical Instructional Video Question Answering Challenge, Chinese Government Text Correction using Knowledge Bases, Faithful Bilingual Long-context Retrieval and Generation Challenge, Chinese Corpus for Gender Bias Detection, Classification, and Mitigation Challenge, and Personalized Emotional Support Conversation.
Participants from both academia and industry are welcomed. Each group can participate in one or multiple tasks. The participants will be invited to submit papers to the main conference and the accepted papers will appear in the conference proceedings.
The top 3 participating teams of each task will be certificated by NLPCC and CCF-NLP. If a task has multiple sub-tasks or tracks, only the top 1 participating team of each sub-task/track will be certificated.
There are eight shared tasks in this year’s NLPCC conference and the detailed description of each task can be found in the task guidelines released later. Here we provide a brief overview of each task.
◇ Task 1 - LLM-Generated Text Detection
Detecting text generated by large language models (LLMs) has become a critical challenge, driven by concerns over the potential for hallucinated information, harmful content, and the risks associated with misuse. This task aims to develop more robust detectors to address this issue. Participants are required to design detection algorithms based on the provided training data to differentiate between LLM-generated and human-written text. During the testing phase, the submitted detectors will undergo various stress tests that simulate real-world scenarios to comprehensively evaluate their practicality and robustness. Data augmentation on the provided training data is allowed, including operations such as cropping, splitting, word replacement, and format adjustment, or use models for paraphrasing while maintaining semantic consistency with the original data. However, introducing external knowledge or creating entirely new content is strictly prohibited. Paraphrasing is limited to the use of open-source models and API-based models. The use of GPT-o1, MoE, or models with parameter sizes exceeding 80B is not allowed. Participants must provide paraphrasing prompts (if applicable) and submit all training data and scripts to ensure fairness and reproducibility.
Organizer: NLP2CT Lab, University of Macau
Contact: Derek Fai WONG, Junchao WU (nlp2ct.junchao@gmail.com), Runzhe ZHAN, Yulin YUAN
◇ Task 2 - Evaluation of Essay On-Topic Graded Comments
Students staying on-topic in their writing reflects their accurate understanding of the given topic. Therefore, when evaluating an essay, determining whether it revolves around the assigned topic is particularly important. The evaluation of essay on-topic grading comments involves assessing whether an essay aligns with the given topic, categorizing it, and assigning a score, as well as generating comments specifically focused on topic relevance. This is a challenging task. The Essay Relevance Grading and Evaluation is divided into two tracks:
● Track 1: Assess whether the essay aligns with the topic by providing a classification result and a predicted score. The classification results include Excellent, Good, Average, Pass, and Fail; the predicted score ranges from 0 to 100.
● Track 2: Generate comments on whether the essay is closely aligned with the requirement of the topic, focusing only on the central idea and its relevance to the topic requirements.
Organizer: East China Normal University
Contact: Haoxiang DONG (hx_dong@stu.ecnu.edu.cn), Xiayu SUN, Man LAN, Xiaopeng BAI, Lixin Ye
◇ Task 3 - Comprehensive Argument Analysis for Chinese Argumentative Essay
Engaging in thorough analysis of arguments enables a profound comprehension of the logical framework within an argumentative essay, allowing for an accurate grasp of the author's line of reasoning and thought process. This, in turn, facilitates a more comprehensive understanding of the intrinsic connections between the ideas and arguments presented in the essay. This task not only focuses on identifying argumentative components (e.g., claims, evidences, and elaboration), but further extracts the argumentative relations between components (e.g., support). The dataset for this task consists of argumentative essays written by Chinese high school students, with manual annotations provided by relevant language experts. The ultimate goal of this task is to improve the depth of comprehensive argumentative analysis, as well as promoting the development of applications in the fields of education.
This shared task includes two tracks:
● Track 1: Argument Component Detection: This track aims to build the fundamental argumentative structure by identifying the argumentative components. Given an essay as input, the output is expected to be the argumentative component categories of each sentence block, which indicates the role it serves within the text.
● Track 2: Argument Relation Identification: The objective of this task is to extract and classify the possible argumentative relations between various argumentative components, thereby forming an comprehensive understanding of the author's argumentative logic.
Organizer: East China Normal University
Contact: Zheqin YIN (zqyin@stu.ecnu.edu.cn), Yupei REN, Man LAN, Xiaopeng BAI, Yuanbin WU, Aimin ZHOU
◇ Task 4 - Multi-modal, Multilingual, and Multi-hop Medical Instructional Video Question Answering Challenge
Designing models that can comprehend multi-modal (text, speech, and image/video) instructional video in the medical domain, process multilingual data, and locate multi-hop questions in the video is an emerging challenge. Following the successful hosts of the 1-st (NLPCC 2023 Foshan) and the 2-rd (NLPCC 2024 Hangzhou) CMIVQA challenges, this year, a new task has been introduced to further advance research in multi-modal, multilingual, and multi-hop (m^3) question answering systems, with a specific focus on medical instructional videos. This task focuses on evaluating models that can integrate information from medical instructional videos, understand multiple languages, and answer complex, multi-hop questions that require reasoning over various modalities. Participants are expected to develop algorithms capable of processing both video and text data, understanding multilingual queries, and providing relevant answers to multi-hop medical questions. Models will be evaluated on the relevance of their answers, as well as their ability to handle complex multi-modal and multilingual inputs.
The task consists of multiple stages, including training, testing, and evaluation, which contains three tracks: multi-modal, multilingual, and multi-hop Temporal Answer Grounding in Singe Video (m^3TAGSV), multi-modal, multilingual, and multi-hop Video Corpus Retrieval (m^3VCR) and multi-modal, multilingual, and multi-hop Temporal Answer Grounding in Video Corpus (m^3TAGVC).
Organizer: Shenzhen Institutes of Advanced Technology, School of Computer Science and Technology, Beijing Institute of Technology
Contact: Bin LI (b.li2@siat.ac.cn), Shenxi LIU, Yixuan WENG, Shoujun ZHOU
◇ Task 5 - Chinese Government Text Correction with Knowledge Bases
Chinese Text Correction has gone through rapid development in recent years, with machine learning-based correction algorithms significantly improving the performance. However, current studies normally lack considerations on the use of knowledge bases to instruct the correction, while knowledge bases could be helpful in correcting key factual errors or adjusting the correction results with respect to the changes in the knowledge bases. We have developed a dataset and knowledge bases specifically for the automatic text correction of government documents based on real user inputs. The "Chinese Government Text Correction with Knowledge Bases" shared task aims to enhance Chinese text correction with the help of knowledge bases, and to test the effectiveness of various methods in utilizing knowledge bases.
Organizer: Zhengzhou University
Contact: Hongfei Xu (hfxunlp@foxmail.com), Yuxiang Jia, Lingling Mu, Yang Song
◇ Task 6 - Faithful Bilingual Long-context Retrieval and Generation Challenge
In the field of long-context models (LCMs), a key characteristic is the "retrieval-then-generation" paradigm. This involves the model first implicitly identifying key information within the context and then performing generation based on the aggregated context. While significant progress has been made in developing long-context models—i.e., achieving strong performance on open-source long-context benchmarks—there is a need for a more fine-grained and accurate evaluation of their ability to retrieve relevant information and generate faithful outputs. Furthermore, while many open-source LCMs perform well in English-language tasks, their performance in Chinese-language tasks remains unsatisfactory, highlighting a significant gap in the field.To address these challenges, this shared task focuses on assessing two core capabilities of LCMs: retrieval and generation, in bilingual scenarios (Chinese and English). Participants are required to complete the task solely using the LCMs themselves, without relying on external modules like retrieval-augmented generation (RAG).This shared task includes two tracks:
● Track 1: Long-context Retrieval Faithfulness (LRF). Given a query and its corresponding long context, the LCM must explicitly locate and output the necessary key information. This track evaluates the model's ability to accurately identify and extract relevant content from long contexts, assessing its retrieval capability without external assistance. The evaluation will be conducted from two dimensions: fine-grained retrieval (sentence-level) and coarse-grained retrieval (paragraph-level).
● Track 2: Long-context Generation Faithfulness (LGF). This track focuses on the faithfulness of the model's generated outputs. Given a long context that may include information conflicting with real-world knowledge or the model's internal knowledge, such as recent news or updated events, the LCM must strictly adhere to the provided context to generate outputs. This track evaluates the model's ability to generate responses that are both contextually accurate and faithful, without relying on internal knowledge or external retrieval modules. The use of inference-scaling techniques is allowed to enhance the model's performance.
Organizer: Soochow University, PLA Academy of Military Sciences
Contact: Zecheng Tang (zctang@stu.suda.edu.cn), Baibei Ji, Haitian Wang, Ruoxi Sun, Juntao Li, Wenpeng Hu
◇ Task 7 - CORGI-PM 2025: Chinese Corpus for Gender Bias Detection, Classification and Mitigation Challenge
As artificial intelligence and natural language processing systems become increasingly integrated into our daily lives, addressing gender bias in these systems has become crucial for ensuring fair and equitable technology. Following the groundbreaking work of CORGI-PM, which established the first sentence-level Chinese corpus for gender bias probing and mitigation, we propose a comprehensive shared task to advance the state-of-the-art in automatic gender bias detection, classification, and mitigation in Chinese text.
The challenge consists of three tracks that progressively build upon each other:
● Track 1: Gender Bias Detection: Binary classification task to identify whether a given Chinese text contains gender bias. Evaluation metrics: Precision, Recall, F1-score.
● Track 2: Gender Bias Classification: Multi-label classification of identified gender bias into three categories: Gender Stereotyped Activity and Career Choices (AC), Gender Stereotyped Descriptions and Inductions (DI), Expressed Gender-stereotyped Attitudes, Norms and Beliefs (ANB). Evaluation metrics: Class-wise and macro-averaged Precision, Recall, F1-score.
● Track 3: Gender Bias Mitigation: Natural language generation task to rewrite biased text while preserving core meaning, Evaluation using BLEU, METEOR, and ROUGE-L.
Dataset: Training set: Enhanced version of CORGI-PM with 30,000+ annotated sentences, development set: 1,500 sentences, hidden test set: 1,500 sentences (All data manually annotated by qualified annotators with relevant educational backgrounds.)
Organizer: NLP Group, University of Manchester, ByteDance Seed
Contact: Yizhi Li (yizhi.li@hotmail.com), Ge Zhang, Chenghua Lin, Wenhao Huang
◇ Task 8 - Personalized Emotional Support Conversation
Emotional Support Conversation (ESC) systems are designed to alleviate users' emotional distress and assist them in managing various challenges. However, traditional ESC systems often generate generic responses that fail to address users' unique, personalized needs. To address this issue, a new task called Personalized Emotional Support Conversation (PESC) is introduced. PESC systems consider the personal traits of users to generate emotionally supportive responses tailored to various individuals. The effectiveness of these systems is assessed using both automatic metrics and LLM-based metrics, ensuring a comprehensive evaluation.
Organizer: Harbin Institute of Technology, Shenzhen
Contact: Bingbing Wang (bingbing.wang@stu.hit.edu.cn), Zhengda Jin, Geng Tu, Ruifeng Xu
◇ Task 1 - LLM-Generated Text Detection
Please fill out the Shared Task 1 Registration Form (Word File) and send it to the following registration email.
Registration Email: nlp2ct.junchao@gmail.com
◇ Task 2 - Evaluation of Essay On-Topic Graded Comments
Please fill out the Shared Task 2 Registration Form (Word File) and send it to the following registration email.
Registration Email: hx_dong@stu.ecnu.edu.cn
◇ Task 3 - Comprehensive Argument Analysis for Chinese Argumentative Essay
Please fill out the Shared Task 3 Registration Form (Word File) and send it to the following registration email.
Registration Email: zqyin@stu.ecnu.edu.cn
◇ Task 4 - Multi-modal, Multilingual, and Multi-hop Medical Instructional Video Question Answering Challenge
Please fill out the Shared Task 4 Registration Form (Word File) and send it to the following registration email.
Contact Email: b.li2@siat.ac.cn
◇ Task 5 - Chinese Government Text Correction with Knowledge Bases
Please fill out the Shared Task 5 Registration Form (Word File) and send it to the following registration email.
Registration Email: hfxunlp@foxmail.com
◇ Task 6 - Faithful Bilingual Long-context Retrieval and Generation Challenge
Please fill out the Shared Task 6 Registration Form (Word File) and send it to the following registration email.
Registration Email: zctang@stu.suda.edu.cn
◇ Task 7 - CORGI-PM 2025: Chinese Corpus for Gender Bias Detection, Classification and Mitigation Challenge
Please fill out the Shared Task 7 Registration Form (Word File) and send it to the following registration email.
Registration Email: yizhi.li@hotmail.com
◇ Task 8 - Personalized Emotional Support Conversation
Please fill out the Shared Task 8 Registration Form (Word File) and send it to the following registration email.
Registration Email: 220110515@stu.hit.edu.cn (C.C. bingbing.wang@stu.hit.edu.cn, 22b951011@stu.hit.edu.cn)
2025/02/17:announcement of shared tasks and call for participation;
2025/02/17:registration open;
2025/02/28:release of detailed task guidelines & training data;
2025/03/25:registration deadline;
2025/04/11:release of test data;
2025/04/20:participants’ results submission deadline;
2025/04/30:evaluation results release and call for system reports and conference paper;
2025/05/22:conference paper submission deadline (only for shared tasks);
2025/06/12:conference paper accept/reject notification;
2025/06/25:camera-ready paper submission deadline.
The evaluation papers are English only. The papers will be in the proceedings of the NLPCC 2025 conference (for English) which will be published as a volume in the Springer LNAI series (EI & ISTP indexed, English papers). Submissions should follow the LNCS formatting instructions. The maximum length for papers is 12 pages, including references and appendices. Submissions must adhere to the standard Springer style sheets ([LaTeX][Microsoft Word]). All papers should be prepared in PDF format.
Submissions must be made electronically via the OpenReview platform. Note that submissions sent by email will not be considered.
Xiaopeng Bai, East China Normal University
Yunbo Cao, PLA Academy of Military Sciences
Haoxiang Dong, East China Normal University
Wenpeng Hu, PLA Academy of Military Sciences
Wenhao Huang, University of Manchester, ByteDance Seed
Baibei Ji, Soochow University
Tao Ji, Fudan University
Yuxiang Jia, Zhengzhou University
Zhengda Jin, Harbin Institute of Technology, Shenzhen
Man Lan, East China Normal University
Bin Li, Chinese Academy of Sciences, School of Computer Science and Technology
Juntao Li, Soochow University
Yizhi Li, University of Manchester; ByteDance Seed
Chenghua Lin, University of Manchester; ByteDance Seed
Shenxi Liu, Chinese Academy of Sciences, School of Computer Science and Technology
Lingling Mu, Zhengzhou University
Yupei Ren, East China Normal University
Yang Song, Zhengzhou University
Ruoxi Sun, Soochow University
Xiayu Sun, East China Normal University
Zecheng Tang, Soochow University
Geng Tu, Harbin Institute of Technology, Shenzhen
Bingbing Wang, Harbin Institute of Technology, Shenzhen
Haitian Wang, Soochow University
Yixuan Weng, Chinese Academy of Sciences, School of Computer Science and Technology
Derek Fai WONG, University of Macau
Junchao Wu, University of Macau
Yuanbin Wu, East China Normal University
Hongfei Xu, Zhengzhou University
Ruifeng Xu, Harbin Institute of Technology, Shenzhen
Lixin Ye, East China Normal University
Zheqin Yin, East China Normal University
Yulin Yuan, University of Macau
Runzhe Zhan, University of Macau
Ge Zhang, University of Manchester; ByteDance Seed
Aimin Zhou, East China Normal University
Shoujun Zhou, Chinese Academy of Sciences, School of Computer Science and Technology