NLPCC 2017 Shared Tasks: Call for Participation

The conference on Natural Language Processing and Chinese Computing (NLPCC) is the annual conference of CCF TCCI (Technical Committee of Chinese Information, China Computer Federation). The NLPCC conferences have been successfully held in Beijing (2012)，Chongqing (2013), Shenzhen (2014), Nanchang (2015) and Kunming (2016). This year’s NLPCC conference will be held in Dalian on November 8 - 12, 2017.

NLPCC 2017 will follow the NLPCC tradition of holding several shared tasks in natural language processing and Chinese computing. This year’s shared tasks focus on both classic problems and newly emerging problems, including Chinese Word Semantic Relation Classification, News Headline Categorization, Single Document Summarization, Emotional Conversation Generation, Open Domain Question Answering, and Social Media User Modeling.

Participants from both academia and industry are welcomed. Each group can participate in one or multiple tasks and members in each group can attend the NLPCC conference to present their techniques and results. The participants will be invited to submit papers to the main conference and the accepted papers will appear in the conference proceedings published by Springer LNCS.

1. Overview of the Shared Tasks

There are six shared tasks in this year’s NLPCC conference and the details of each task can be found in the document of task guidelines. Here we give a very brief overview of each task.

◇ Chinese Word Semantic Relation Classification

This task is focused on lexical semantics and it aims to evaluate the techniques of automatic classification of Chinese word semantic relations. Given a pair of Chinese words, it is required to classify the word pair into one of the following semantic relations: synonym (e.g., 计算机-电脑), antonym (e.g., 上涨-下降), hyponym (e.g., 食材-红薯), meronymy (e.g., 汽车-发动机) and person-affiliation relation (e.g., 医生-医院).

◇ News Headline Categorization

This task aims to evaluate the automatic classification techniques for very short texts, i.e., Chinese news headlines. Each news headline (i.e., news title) is required to be classified into one or more predefined categories.

◇ Single Document Summarization

This task provides a dataset for single document summarization of Chinese news articles, to evaluate and compare different document summarization techniques.

◇ Emotional Conversation Generation

As a vital part of human intelligence, emotional intelligence is defined as the ability to perceive, integrate, understand, and regulate emotions. Though a variety of models have been proposed for conversation generation from large-scale social data, it is still quite challenging (and yet to be addressed) to generate emotional responses. In this shared task, participants are expected to generate Chinese responses that are not only appropriate in content but also adequate in emotion, which is quite important for building an empathic chatting machine. For instance, if user says “My cat died yesterday”, the most appropriate response may be “It’s so sad, so sorry to hear that”to express sadness, but also could be “Bad things always happen, I hope you will be happy soon” to express comfort.

◇ Open Domain Question Answering

Open domain QA evaluation includes three tasks, knowledge-based question answering (KBQA), document-based question answering (DBQA), and table-based question answering (TBQA). The task of KBQA is to answer Chinese factoid questions based on a given structured knowledge base. The task of DBQA is to answer Chinese questions by selecting one or multiple sentences from a given document as answers. In addition to the training and test sets released in NLPCC-ICCPOL 2016, this year, we will provide two new test sets for the above two tasks respectively. The task of TBQA is a totally new QA task, which aims to answer English questions by retrieving one or more tables from a table collection as answers. We hope our datasets and evaluation can promote the development of QA research in China.

◇ Social Media User Modeling

User modeling on social media is essential for business decisions, such as user segmentation and targeting advertisement. Since user behavioral data on social media is heterogeneous, it’s still challenging to effectively leverage the heterogeneous information for user modeling. This task provides a social media dataset including the following heterogeneous information: users’ profiles (such as gender, age), social ties (following relationship), users’ tags, users’ published tweets, and users’ location visits. The user modeling task include the following two subtasks: 1) Interested Location Prediction, given users’ some historical location visits and other provided information, predict what locations a user is interested to visit in the future. 2) User Profiling, given users’ other information expect profiles, predict each user’s profile information.

2. How to Participate

Please fill out the registration form and send it to the coordinator Fang Liu(刘芳) by email (contact@nlpcc2017.info) before April 30, 2017.

If you have any question about the shared tasks, please do not hesitate to contact us by email.

3. Important dates

2017/3/6：announcement of shared tasks and call for participation;

2017/3/31：release of detailed task guidelines & sample data release;

2017/4/30：registration deadline;

2017/5/30：test data release;

2017/6/5：participants’ results submission deadline;

2017/6/15：evaluation results release and call for system reports and conference papers;

2017/7/15：conference paper submission deadline (only for shared tasks);

2017/8/20：conference paper accept/reject notification;